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BAKER BOTTS L.L.P. 
30 ROCKEFELLER PLAZA 
NEW YORK, NEW YORK 10112 



TO ALL WHOM IT MAY CONCERN: 

Be it known that WE, Peter Williams and Ian Gould, citizens of United Kingdom, and 

Thomas J. Taylor and Daniel J.B. Williams, citizens of the United States, whose post office 

addresses are 4701 N. 56th Street, Phoenix, Arizona, 85018; 12201 S. 45th Street, Phoenix, 

Arizona, 85044, 1347 W. 13th PI, Tempe, Arizona, 85281 and 155 W. 3rd Street, Tempe, 

Arizona, 85281; respectively, have made an invention in 

METHOD OF DETERMINING THE NUCLEOTIDE SEQUENCE OF 
OLIGONUCLEOTIDES AND DNA MOLECULES 

of which the following is a 

SPECIFICATION 
1. INTRODUCTION 

[0001] The present invention relates to a novel method for analyzing nucleic acid sequences 
based on real-time detection of DNA polymerase-catalyzed incorporation of each of the four 
deoxynucleoside monophosphates, supplied individually and serially as deoxynucleoside 
triphosphates in a microfluidic system, to a template system comprising a DNA fragment of 
unknown sequence and an oligonucleotide primer. Incorporation of a deoxynucleoside 
monophosphate (dNMP) into the primer can be detected by any of a variety of methods 
including but not limited to fluorescence and chemiluminescence detection. Alternatively, 
microcalorimetic detection of the heat generated by the incorporation of a dNMP into the 
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extending primer using thermopile, thermistor and refractive index measurements can be used to 
detect extension reactions. The present invention further provides a method for monitoring and 
correction of sequencing errors due to misincorporation or extension failure. 

[0002] The present invention provides a method for sequencing DNA that avoids 
electrophoretic separation of DNA fragments thus eliminating the problems associated with 
anomalous migration of DNA due to repeated base sequences or other self-complementary 
sequences which can cause single-stranded DNA to self-hybridize into hairpin loops, and also 
avoids current limitations on the size of fragments that can be read. The method of the invention 
can be utilized to determine the nucleotide sequence of genomic or cDNA fragments, or 
alternatively, as a diagnostic tool for sequencing patient derived DNA samples. 

2. BACKGROUND OF THE INVENTION 
[0003] Currently, two approaches are utilized for DNA sequence determination: the dideoxy 
chain termination method of Sanger (1977, Proc. Natl. Acad. Sci 74:5463-5674) and the 
chemical degradation method of Maxam (1977, Proc. Natl. Acad. Sci 74:560-564). The Sanger 
dideoxy chain termination method is the most widely used method and is the method upon which 
automated DNA sequencing machines rely. In the chain termination method, DNA polymerase 
enzyme is added to four separate reaction systems to make multiple copies of a template DNA 
strand in which the growth process has been arrested at each occurrence of an A, in one set of 
reactions, and a G, C, or T, respectively, in the other sets of reactions, by incorporating in each 
reaction system one nucleotide type lacking the 3 f -OH on the deoxyribose at which chain 
extension occurs. This procedure produces a series of DNA fragments of different lengths, and it 
is the length of the extended DNA fragment that signals the position along the template strand at 
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which each of four bases occur. To determine the nucleotide sequence, the DNA fragments are 
separated by high resolution gel electrophoresis and the order of the four bases is read from the 
gel. 

[0004] A major research goal is to derive the DNA sequence of the entire human genome. To 
meet this goal the need has developed for new genomic sequencing technology that can dispense 
with the difficulties of gel electrophoresis, lower the costs of performing sequencing reactions, 
including reagent costs, increase the speed and accuracy of sequencing, and increase the length 
of sequence that can be read in a single step. Potential improvements in sequencing speed may 
be provided by a commercialized capillary gel electrophoresis technique such as that described 
in Marshall and Pennisis (1998, Science 280:994-995). However, a major problem common to 
all gel electrophoresis approaches is the occurrence of DNA sequence compressions, usually 
arising from secondary structures in the DNA fragment, which result in anomalous migration of 
certain DNA fragments through the gel. 

[0005] As genomic information accumulates and the relationships between gene mutations and 
specific diseases are identified, there will be a growing need for diagnostic methods for 
identification of mutations. In contrast to the large scale methods needed for sequencing large 
segments of the human genome, what is needed for diagnostic methods are repetitive, low-cost, 
highly accurate techniques for resequencing of certain small isolated regions of the genome. In 
such instances, methods of sequencing based on gel electrophoresis readout become far too slow 
and expensive. 
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[0006] When considering novel DNA sequencing techniques, the possibility of reading the 
sequence directly, much as the cell does, rather than indirectly as in the Sanger 
dideoxynucleotide approach, is a preferred goal This was the goal of early unsuccessful 
attempts to determine the shapes of the individual nucleotide bases with scanning probe 
microscopes. 

[0007] Additionally, another approach for reading a nucleotide sequence directly is to treat the 
DNA with an exonuclease coupled with a detection scheme for identifying each nucleotide 
sequentially released as described in Goodwin et aL, (1995, Experimental Techniques of Physics 
41:279-294). However, researchers using this technology are confronted with the enormous 
problem of detecting and identifying single nucleotide molecules as they are digested from a 
single DNA strand. Simultaneous exonuclease digestion of multiple DNA strands to yield larger 
signals is not feasible because the enzymes rapidly get out of phase, so that nucleotides from 
different positions on the different strands are released together, and the sequences become 
unreadable. It would be highly beneficial if some means of external regulation of the 
exonuclease could be found so that multiple enzyme molecules could be compelled to operate in 
phase. However, external regulation of an enzyme that remains docked to its polymeric substrate 
is exceptionally difficult, if not impossible, because after each digestion the next substrate 
segment is immediately present at the active site. Thus, any controlling signal must be present at 
the active site at the start of each reaction. 

[0008] A variety of methods may be used to detect the polymerase-catalyzed incorporation of 
deoxynucleoside monophosphates (dNMPs) into a primer at each template site. For example, 
the pyrophosphate released whenever DNA polymerase adds one of the four dNTPs onto a 
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primer 3' end may be detected using a chemiluminescent based detection of the pyrophosphate as 
described in Hyman E.D. (1988, Analytical Biochemistry 174:423-436) and U.S. Patent No. 
4,971,903. This approach has been utilized most recently in a sequencing approach referred to 
as "sequencing by incorporation" as described in Ronaghi (1996, Analytical Biochem. 242:84) 
and Ronaghi (1998, Science 281:363-365). However, there exist two key problems associated 
with this approach, destruction of unincorporated nucleotides and detection of pyrophosphate. 
The solution to the first problem is to destroy the added, unincorporated nucleotides using a 
dNTP-digesting enzyme such as apyrase. The solution to the second is the detection of the 
pyrophosphate using ATP sulfurylase to reconvert the pyrophosphate to ATP which can be 
detected by a luciferase chemiluminescent reaction as described in U.S. Patent No. 4,971,903 
and Ronaghi (1998, Science 281 :363-365). Deoxyadenosine a- thiotriphosphate is used instead 
of dATP to minimize direct interaction of injected dATP with the luciferase. 

[0009] Unfortunately, the requirement for multiple enzyme reactions to be completed in each 
cycle imposes restrictions on the speed of this approach while the read length is limited by the 
impossibility of completely destroying unincorporated, non-complementary, nucleotides. If 
some residual amount of one nucleotide remains in the reaction system at the time when a fresh 
aliquot of a different nucleotide is added for the next extension reaction, there exists a possibility 
that some fraction of the primer strands will be extended by two or more nucleotides, the added 
nucleotide type and the residual impurity type, if these match the template sequence, and so this 
fraction of the primer strands will then be out of phase with the remainder. This out of phase 
component produces an erroneous incorporation signal which grows larger with each cycle and 
ultimately makes the sequence unreadable. 
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[0010] A different direct sequencing approach uses dNTPs tagged at the 3' OH position with 
four different colored fluorescent tags, one for each of the four nucleotides is described in 
Metzger, M.L., et al. (1994, Nucleic Acids Research 22:4259-4267). In this approach, the 
primer/template duplex is contacted with all four dNTPs simultaneously. Incorporation of a 3' 
tagged NMP blocks further chain extension. The excess and unreacted dNTPs are flushed away 
and the incorporated nucleotide is identified by the color of the incorporated fluorescent tag. The 
fluorescent tag must then be removed in order for a subsequent incorporation reaction to occur. 
Similar to the pyrophosphate detection method, incomplete removal of a blocking fluorescent tag 
leaves some primer strands unextended on the next reaction cycle, and if these are subsequently 
unblocked in a later cycle, once again an out-of-phase signal is produced which grows larger 
with each cycle and ultimately limits the read length. To date, this method has so far been 
demonstrated to work for only a single base extension. Thus, this method is slow and is likely to 
be restricted to very short read lengths due to the fact that 99% efficiency in removal of the tag is 
required to read beyond 50 base pairs. Incomplete removal of the label results in out of phase 
extended DNA strands. 

3, SUMMARY OF THE INVENTION 
[0011] Accordingly, it is an object of the present invention to provide a novel method for 
determining the nucleotide sequence of a DNA fragment which eliminates the need for 
electrophoretic separation of DNA fragments. The inventive method, referred to herein as 
"reactive sequencing", is based on detection of DNA polymerase catalyzed incorporation of each 
of the four nucleotide types, when deoxynucleoside triphosphates (dNTP's) are supplied 
individually and serially to a DNA primer/template system. The DNA primer/template system 
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comprises a single stranded DNA fragment of unknown sequence, an oligonucleotide primer that 
forms a matched duplex with a short region of the single stranded DNA, and a DNA polymerase 
enzyme. The enzyme may either be already present in the template system, or may be supplied 
together with the dNTP solution. 

[0012] Typically a single deoxynucleoside triphosphate (dNTP) is added to the DNA primer 
template system and allowed to react. As used herein deoxyribonucleotide means and includes, 
in addition to dGTP, dCTP, dATP, dTTP, chemically modified versions of these 
deoxyribonucleotides or analogs thereof. Such chemically modified deoxyribonucleotides 
include but are not limited to those deoxyribonucleotides tagged with a fluorescent or 
chemiluminescent moiety. Analogs of deoxyribonucleotides that may be used include but are 
not limited to 7-deazapurine. The present invention additionally provides a method for 
improving the purity of deoxynucleotides used in the polymerase reaction. 

[0013] An extension reaction will occur only when the incoming dNTP base is complementary 
to the next unpaired base of the DNA template beyond the 3' end of the primer. While the 
reaction is occurring, or after a delay of sufficient duration to allow a reaction to occur, the 
system is tested to determine whether an additional nucleotide derived from the added dNTP has 
been incorporated into the DNA primer/template system. A correlation between the dNTP added 
to the reaction cell and detection of an incorporation signal identifies the nucleotide incorporated 
into the primer/template. The amplitude of the incorporation signal identifies the number of 
nucleotides incorporated, and thereby quantifies single base repeat lengths where these occur. 
By repeating this process with each of the four nucleotides individually, the sequence of the 
template can be directly read in the 5' to 3 1 direction one nucleotide at a time. 
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[0014] Detection of the polymerase mediated extension reaction and quantification of the 
extent of reaction can occur by a variety of different techniques, including but not limited to, 
microcalorimetic detection of the heat generated by the incorporation of a nucleotide into 
the extending duplex. Optical detection of an extension reaction by fluorescence or 
chemiluminescence may also be used to detect incorporation of nucleotides tagged with 
fluorescent or chemiluminescent entities into the extending duplex. Where the incorporated 
nucleotide is tagged with a fluorophore, excess unincorporated nucleotide is removed, and the 
template system is illuminated to stimulate fluorescence from the incorporated nucleotide. The 
fluorescent tag may then be cleaved and removed from the DNA template system before a 
subsequent incorporation cycle begins. A similar process is followed for chemiluminescent tags, 
with the chemiluminescent reaction being stimulated by introducing an appropriate reagent into 
the system, again after excess unreacted tagged dNTP has been removed; however, 
chemiluminescent tags are typically destroyed in the process of readout and so a separate 
cleavage and removal step following detection may not be required. For either type of tag, 
fluorescent or chemiluminescent, the tag may also be cleaved after incorporation and transported 
to a separate detection chamber for fluorescent or chemiluminescent detection. In this way, 
fluorescent quenching by adjacent fluorophore tags incorporated in a single base repeat sequence 
may be avoided. In addition, this may protect the DNA template system from possible radiation 
damage in the case of fluorescent detection or from possible chemical damage in the case of 
chemiluminescent detection. Alternatively the fluorescent tag may be selectively destroyed by a 
chemical or photochemical reaction. This process eliminates the need to cleave the tag after each 
readout, or to detach and transport the tag from the reaction chamber to a separate detection 
chamber for fluorescent detection. The present invention provides a method for selective 
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destruction of a fluorescent tag by a photochemical reaction with diphenyliodonium ions or 
related species. 

[0015] The present invention further provides a reactive sequencing method that utilizes a two 
cycle system. An exonuclease-deficient polymerase is used in the first cycle and a mixture of 
exonuclease-deficient and exonuclease-proficient enzymes are used in the second cycle. In the 
first cycle, the template-primer system together with an exonuclease-deficient polymerase will be 
presented sequentially with each of the four possible nucleotides. In the second cycle, after 
identification of the correct nucleotide, a mixture of exonuclease proficient and deficient 
polymerases, or a polymerase containing both types of activity will be added in a second cycle 
together with the correct dNTP identified in the first cycle to complete and proofread the primer 
extension. In this way, an exonuclease-proficient polymerase is only present in the reaction cell 
when the correct dNTP is present, so that exonucleolytic degradation of correctly extended 
strands does not occur, while degradation and correct re-extension of previously incorrectly 
extended strands does occur, thus achieving extremely accurate strand extension. 

[0016] The present invention also provides a method for monitoring reactive sequencing 
reactions to detect and correct sequencing reaction errors resulting from misincorporation, i.e., 
incorrectly incorporating a non-complementary base, and extension failure, i.e., failure to extend 
a fraction of the DNA primer strands. The method is based on the ability to (i) determine the 
size of the trailing strand population (trailing strands are those primer strands which have 
undergone an extension failure at any extension prior to the current reaction step); (ii) determine 
the downstream sequence of the trailing strand population between the 3' terminus of the trailing 
strands and the 3' terminus of the corresponding leading strands ("downstream" refers to the 
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template sequence beyond the current 3' terminus of a primer strand; correspondingly, 
"upstream" refers to the known template and complementary primer sequence towards the 5' end 
of the primer strand; "leading strands" are those primer strands which have not previously 
undergone extension failure); and (iii) predict at each extension step the signal to be expected 
from the extension of the trailing strands through simulation of the occurrence of an extension 
failure at any point upstream from the 3' terminus of the leading strand. Subtraction of the 
predicted signal from the measured signal yields a signal due only to valid extension of the 
leading strand population. 

[0017] In a preferred embodiment of the invention, the monitoring for reactive sequencing 
reaction errors is computer-aided. The ability to monitor extension failures permits 
determination of the point to which the trailing strands for a given template sequence have 
advanced and the sequence in the 1, 2 or 3 base gap between these strands and the leading 
strands. Knowing this information the dNTP probe cycle can be altered to selectively extend the 
trailing strands for a given template sequence while not extending the leading strands, thereby 
resynchronizing the populations. 

[0018] The present invention further provides an apparatus for DNA sequencing comprising: 
(a) at least one chamber including a DNA primer/template system which produces a detectable 
signal when a DNA polymerase enzyme incorporates a deoxyribonucleotide monophosphate 
onto the 3' end of the primer strand; (b) means for introducing into, and evacuating from, the 
reaction chamber at least one selected from the group consisting of buffers, electrolytes, DNA 
template, DNA primer, deoxyribonucleotides, and polymerase enzymes; (c) means for 
amplifying said signal; and (d) means for converting said signal into an electrical signal. 
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4. BRIEF DESCRIPTION OF THE DRAWINGS 
[0019] Further objects and advantages of the invention will be apparent from a reading of the 
following description in conjunction with the accompanying drawings, in which: 
[0020] Figure 1 is a schematic diagram illustrating a reactive sequencing device containing a 
thin film bismuth antimony thermopile in accordance with the invention; 

[0021] Figure 2 is a schematic diagram of a reactive sequencing device containing a thermistor 
in accordance with the invention; 

[0022] Figure 3 is a schematic diagram illustrating a representative embodiment of 
microcalorimetry detection of a DNA polymerase reaction in accordance with the invention; 
[0023] Figure 4 is an electrophoretic gel showing a time course for primer extension assays 
catalyzed by T4 DNA polymerase mutants; 

[0024] Figure 5 is a schematic diagram illustrating a nucleotide attached to a fluorophore by a 
benzoin ester which is a photocleavable linker for use in the invention; 

[0025] Figure 6 is a schematic illustration of a nucleotide attached to a chemiluminescent tag 
for use in the invention; 

[0026] Figure 7 is a schematic diagram of a nucleotide attached to a chemiluminescent tag by a 
cleavable linkage; 

[0027] Figure 8(a) and 8(b) are schematic diagrams of a mechanical fluorescent sequencing 
method in accordance with the invention in which a DNA template and primer are absorbed on 
beads captured behind a porous frit; and 

[0028] Figure 9 is a schematic diagram of a sequencing method in accordance with the 
invention utilizing a two cycle system. 
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[0029] Figure 10 is a diagram of the mechanism of photochemical degradation of fluorescein 
by diphenyliodonium ion (DPI). 

[0030] Fig. 1 1 shows fluorescence spectra of equimolar concentrations of fluorescein and 
tetramethylrhodamine dyes before and after addition of a solution of diphenyliodonium chloride. 
[0031] Figure 12 is the UV absorption spectra obtained from (1) fluorescein and (2) 
fluorescein + DPI after a single flash from a xenon camera strobe. 

[0032] Figure 13 displays the fluorescence spectra from single nucleotide polymerase reactions 
with DPI photobleaching between incorporation reactions. 

[0033] Figure 14A-D. Simulation of Reactive Sequencing of [CTGA] GAA ACC AGA AAG 
TCC [T], probed with a dNTP cycle. 14A. Sequence readout close to the primer where no 
extension failure has occurred. 14B. Sequence readout downstream of primer where 60% of the 
strands have undergone extension failure and are producing out of phase signals and 
misincorporation has prevented extension on 75% of all strands. 14C. Downstream readout with 
error signals from trailing strands (dark shading) distinguished from correct readout signals from 
leading strands (light shading) using knowledge of the downstream sequence of the trailing 
strands. 14D. Corrected sequence readout following subtraction of error signals from trailing 
strands. Note the similarity to the data of Fig. 1A. 

[0034] Figure 15. Effect of a leading strand population on extension signals. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 
[0035] The present invention provides a method for determining the nucleic acid sequence of a 
DNA molecule based on detection of successive single nucleotide DNA polymerase mediated 
extension reactions. As described in detail below, in one embodiment, a DNA primer/template 
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system comprising a polynucleotide primer complementary to and bound to a region of the DNA 
to be sequenced is constrained within a reaction cell into which buffer solutions containing 
various reagents necessary for a DNA polymerase reaction to occur are added. Into the reaction 
cell, a single type of deoxynucleoside triphosphate (dNTP) is added. Depending on the identity 
of the next complementary site in the DNA primer/template system, an extension reaction will 
occur only when the appropriate nucleotide is present in the reaction cell. A correlation between 
the nucleotide present in the reaction cell and detection of an incorporation signal identifies the 
next nucleotide of the template. Following each extension reaction, the reaction cell is flushed 
with dNTP-free buffer, retaining the DNA primer/template system, and the cycle is repeated 
until the entire nucleotide sequence is identified. 

[0036] The present invention is based on the existence of a control signal within the active site 
of DNA polymerases which distinguish, with high fidelity, complementary and 
non-complementary fits of incoming deoxynucleotide triphosphates to the base on the template 
strand at the primer extension site, i.e., to read the sequence, and to incorporate at that site only 
the one type of deoxynucleotide that is complementary. That is, if the available nucleotide type 
is not complementary to the next template site, the polymerase is inactive, thus, the template 
sequence is the DNA polymerase control signal. Therefore, by contacting a DNA polymerase 
system with a single nucleotide type rather than all four, the next base in the sequence can be 
identified by detecting whether of not a reaction occurs. Further, single base repeat lengths can 
be quantified by quantifying the extent of reaction. 

[0037] As a first step in the practice of the inventive method, single-stranded template DNA to 
be sequenced is prepared using any of a variety of different methods known in the art. Two types 
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of DNA can be used as templates in the sequencing reactions. Pure single-stranded DNA such as 
that obtained from recombinant bacteriophage can be used. The use of bacteriophage provides a 
method for producing large quantities of pure single stranded template. Alternatively, 
single-stranded DNA may be derived from double-stranded DNA that has been denatured by 
heat or alkaline conditions, as described in Chen and Subrung, (1985, DNA 4:165); Huttoi and 
Skaki (1986, Anal. Biochem. 152:232); and Mierendorf and Pfeffer, (1987, Methods EnzymoL 
152:556), may be used. Such double stranded DNA includes, for example, DNA samples 
derived from patients to be used in diagnostic sequencing reactions. 

[0038] The template DNA can be prepared by various techniques well known to those of skill 
in the art. For example, template DNA can be prepared as vector inserts using any conventional 
cloning methods, including those used frequently for sequencing. Such methods can be found in 
Sambrook et al., Molecular Cloning: A Laboratory Manual, Second Edition (Cold Spring Harbor 
Laboratories, New York, 1989). In a preferred embodiment of the invention, polymerase chain 
reactions (PCR) may be used to amplify fragments of DNA to be used as template DNA as 
described in Innis et al., ed. PCR Protocols (Academic Press, New York, 1990). 

[0039] The amount of DNA template needed for accurate detection of the polymerase reaction 
will depend on the detection technique used. For example, for optical detection, e.g., 
fluorescence or chemiluminescence detection, relatively small quantities of DNA in the 
femtomole range are needed. For thermal detection quantities approaching one picomole may be 
required to detect the change in temperature resulting from a DNA polymerase mediated 
extension reaction. 
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[0040] In enzymatic sequencing reactions, the priming of DNA synthesis is achieved by the 
use of an oligonucleotide primer with a base sequence that is complementary to, and therefore 
capable of binding to, a specific region on the template DNA sequence. In instances where the 
template DNA is obtained as single stranded DNA from bacteriophage, or as double stranded 
DNA derived from plasmids, "universal" primers that are complementary to sequences in the 
vectors, i.e., the bacteriophage, cosmid and plasmid vectors, and that flank the template DNA, 
can be used. 

[0041] Primer oligonucleotides are chosen to form highly stable duplexes that bind to the 
template DNA sequences and remain intact during any washing steps during the extension 
cycles. Preferably, the length of the primer oligonucleotide is from 18-30 nucleotides and 
contains a balanced base composition. The structure of the primer should also be analyzed to 
confirm that it does not contain regions of dyad symmetry which can fold and self anneal to form 
secondary structures thereby rendering the primers inefficient. Conditions for selecting 
appropriate hybridization conditions for binding of the oligonucleotide primers in the template 
systems will depend on the primer sequence and are well known to those of skill in the art. 

[0042] In utilizing the reactive sequencing method of the invention, a variety of different DNA 
polymerases may be used to incorporate dNTPs onto the 3' end of the primer which is hybridized 
to the template DNA molecule. Such DNA polymerases include but are not limited to Taq 
polymerase, T7 or T4 polymerase, and Klenow polymerase. In a preferred embodiment of the 
invention, described in detail below, DNA polymerases lacking 5-3'-exonuclease proofreading 
activity are used in the sequencing reactions. For the most rapid reaction kinetics, the amount of 
polymerase is sufficient to ensure that each DNA molecule carries a non-covalently attached 
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polymerase molecule during reaction. For a typical equilibrium constant of -50 nM for the 
dissociation equilibrium: 

DNA-Pol ^ DNA + Pol K~50nM 
the desired condition is: [Pol] > 50nM + [DNA]. 

[0043] In addition, reverse transcriptase which catalyzes the synthesis of single stranded DNA 
from an RNA template may be utilized in the reactive sequencing method of the invention to 
sequence messenger RNA (mRNA). Such a method comprises sequentially contacting an RNA 
template annealed to a primer (RNA primer/template) with dNTPs in the presence of reverse 
transcriptase enzyme to determine the sequence of the RNA. Because mRNA is produced by 
RNA polymerase-catalyzed synthesis from a DNA template, and thus contains the sequence 
information of the DNA template strand, sequencing the mRNA yields the sequence of the DNA 
gene from which it was transcribed. Eukaryotic mRNAs have poly(A) tails and therefore the 
primer for reverse transcription can be an oligo(dT). Typically, it will be most convenient to 
synthesize the oligo(dT) primer with a terminal biotin or amino group through which the primer 
can be captured on a substrate and subsequently hybridize to and capture the template mRNA 
strand. 

[0044] The extension reactions are carried out in buffer solutions which contain the 
appropriate concentrations of salts, dNTPs and DNA polymerase required for the DNA 
polymerase mediated extension to proceed. For guidance regarding such conditions see, for 
example, Sambrook et al., (1989, Molecular Cloning, A Laboratory Manual, Cold Spring Harbor 
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Press, N.Y.); and Ausubel et al. (1989, Current Protocols in Molecular Biology, Green 
Publishing Associates and Wiley Interscience, N.Y. ). 

[0045] Typically, buffer containing one of the four dNTPs is added into a reaction cell. 
Depending on the identity of the nucleoside base at the next unpaired template site in the 
primer/template system, a reaction will occur when the reaction cell contains the appropriate 
dNTP. When the reaction cell contains any one of the other three incorrect dNTPs, no reaction 
will take place. 

[0046] The reaction cell is then flushed with dNTP free buffer and the cycle is repeated until a 
complete DNA sequence is identified. Detection of a DNA polymerase mediated extension can 
be made using any of the detection methods described in detail below including optical and 
thermal detection of an extension reaction. 

[0047] In some instances, a nucleotide solution is found to be contaminated with any of the 
other three nucleotides. In such instances a small fraction of strands may be extended by 
incorporation of an impurity dNTP when the dNTP type supplied is incorrect for extension, 
producing a population of strands which are subsequently extended ahead of the main strand 
population. Thus, in an embodiment of the invention, each nucleotide solution can be treated to 
remove any contaminated nucleotides. Treatment of each nucleotide solution involves reaction 
of the solution prior to use with immobilized DNA complementary to each the possibly 
contaminating nucleotides. For example, a dATP solution will be allowed to react with 
immobilized poly (dA), poly (dG) or poly (dC), with appropriate primers and polymerase, for a 
time sufficient to incorporate any contaminating dTTP, dCTP and dGTP nucleotides into DNA. 
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[0048] In a preferred embodiment of the invention, the primer/template system comprises the 
template DNA tethered to a solid phase support to permit the sequential addition of sequencing 
reaction reagents without complicated and time consuming purification steps following each 
extension reaction. Preferably, the template DNA is covalently attached to a solid phase support, 
such as the surface of a reaction flow cell, a polymeric microsphere, filter material, or the like, 
which permits the sequential application of sequencing reaction reagents, z.e., buffers, dNTPs 
and DNA polymerase, without complicated and time consuming purification steps following 
each extension reaction. Alternatively, for applications that require sequencing of many samples 
containing the same vector template or same gene, for example, in diagnostic applications, a 
universal primer may be tethered to a support, and the template DNA allowed to hybridize to the 
immobilized primer. 

[0049] The DNA may be modified to facilitate covalent or non-covalent tethering of the DNA 
to a solid phase support. For example, when PCR is used to amplify DNA fragments, the 5 ! 
ends of one set of PCR primer oligonucleotides strands may be modified to carry a linker moiety 
for tethering one of the two complementary types of DNA strands produced to a solid phase 
support. Such linker moieties include, for example, biotin. When using biotin, the biotinylated 
DNA fragments may be bound non-covalently to streptavidin covalently attached to the solid 
phase support. Alternatively, an amino group (-NH/?) may be chemically incorporated into one 

of the PCR primer strands and used to covalently link the DNA template to a solid phase support 
using standard chemistry, such as reactions with N-hydroxysuccinimide activated agarose 
surfaces. 



NY02:34442U 



18 



31734 PCT-USA-A 



[0050] In another embodiment, the 5' ends of the sequencing oligonucleotide primer may be 
modified with biotin, for non-covalent capture to a streptavidin-treated support, or with an amino 
group for chemical linkage to a solid support; the template strands are then captured by the 
non-covalent binding attraction between the immobilized primer base sequence and the 
complementary sequence on the template strands. Methods for immobilizing DNA on a solid 
phase support are well known to those of skill in the art and will vary depending on the solid 
phase support chosen. 

[0051] hi the reactive sequencing method of the present invention, DNA polymerase is 
presented sequentially with each of the 4 dNTPs. In the majority of the reaction cycles, only 
incorrect dNTPs will be present, thereby increasing the likelihood of misincorporation of 
incorrect nucleotides into the extending DNA primer/ template system. 

[0052] Accordingly, the present invention further provides methods for optimizing the reactive 
sequencing reaction to achieve rapid and complete incorporation of the correct nucleotide into 
the DNA primer/template system, while limiting the misincorporation of incorrect nucleotides. 
For example, dNTP concentrations may be lowered to reduce misincorporation of incorrect 
nucleotides into the DNA primer. K m values for incorrect dNTPs can be as much as 1000-fold 
higher than for correct nucleotides, indicating that a reduction in dNTP concentrations can 
reduce the rate of misincorporation of nucleotides. Thus, in a preferred embodiment of the 
invention the concentration of dNTPs in the sequencing reactions are approximately 5-20 uM. 
At this concentration, incorporation rates are as close to the maximum rate of 400 nucleotides/s 
for T4 DNA polymerase as possible. 
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[0053] In addition, relatively short reaction times can be used to reduce the probability of 
misincorporation. For an incorporation rate approaching the maximum rate of ~ 400 
nucleotides/s, a reaction time of approximately 25 milliseconds (ms) will be sufficient to ensure 
extension of 99.99% of primer strands. 

[0054] In a specific embodiment of the invention, DNA polymerases lacking 3' to 5' 
exonuclease activity may be used for reactive sequencing to limit exonucleolytic degradation of 
primers that would occur in the absence of correct dNTPs. In the presence of all four dNTPs, 
misincorporation frequencies by DNA polymerases possessing exonucleolytic proofreading 

activity are as low as one error in 10^ to 10^ nucleotides incorporated as discussed in Echols and 
Goodman (1991, Annu. Rev. Biochem 60;477-511); and Goodman et al. (1993, Crit. Rev. 
Biochem. Molec. Biol, 28:83-126); and Loeb and Kunkel (1982, Annu. Rev. Biochem. 
52:429-457). In the absence of proofreading, DNA polymerase error rates are typically on the 

order of 1 in 10^ to 1 in 10^. Although exonuclease activity increases the fidelity of a DNA 
polymerase, the use of DNA polymerases having proofreading activity can pose technical 
difficulties for the reactive sequencing method of the present invention. Not only will the 
exonuclease remove any misincorporated nucleotides, but also, in the absence of a correct dNTP 
complementary to the next template base, the exonuclease will remove correctly-paired 
nucleotides successively until a point on the template sequence is reached where the base is 
complementary to the dNTP in the reaction cell. At this point, an idling reaction is established 
where the polymerase repeatedly incorporates the correct dNMP and then removes it. Only 
when a correct dNTP is present will the rate of polymerase activity exceed the exonuclease rate 



NY02:344421.1 



20 



31734 PCT-USA-A 



so that an idling reaction is established that maintains the incorporation of that correct nucleotide 
at the 3' end of the primer. 

[0055] A number of T4 DNA polymerase mutants containing specific amino acid substitutions 
possess reduced exonuclease activity levels up to 10,000-fold less than the wild-type enzyme. 
For example, Reha-Krantz and Nonay (1993, J. Biol. Chem. 268:27100-17108) report that when 
Asp 112 was replaced with Ala and Glu 114 was replaced with Ala (D112A/E114A) in T4 
polymerase, these two amino acid substitutions reduced the exonuclease activity on double 
stranded DNA by a factor of about 300 relative to the wild type enzyme. Such mutants may be 
advantageously used in the practice of the invention for incorporation of nucleotides into the 
DNA primer/template system. 

[0056] In yet another embodiment of the invention, DNA polymerases which are more 
accurate than wild type polymerases at incorporating the correct nucleotide into a DNA 
primer/template may be used. For example, in a (Dl 12 A/El 14A) mutant T4 polymerase with a 
third mutation where lie 417 is replaced by Val (I417V/D112A/E114A), the 14 17V mutation 
results in an antimutator phenotype for the polymerase (Reha-Krantz and Nonay, 1994, J. Biol. 
Chem. 269:5635-5643; Stocki et al, 1995, Mol. Biol. 254:15-28). This antimutator phenotype 
arises because the polymerase tends to move the primer ends from the polymerase site to the 
exonuclease site more frequently and thus proof read more frequently than the wild type 
polymerase, and thus increases the accuracy of synthesis. 

[0057] In yet another embodiment of the invention, polymerase mutants that are capable of 
efficiently incorporating fluorescent-labeled nucleotides into the template DNA system 



more 
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molecule may be used in the practice of the invention. The efficiency of incorporation of 
fluorescent-labeled nucleotides may be reduced due to the presence of bulky fluorophore labels 
that may inhibit dNTP interaction at the active site of the polymerase. Polymerase mutants that 
may be advantageously used for incorporation of fluorescent-labeled dNTPs into DNA include 
but are not limited to those described in U.S. Application Serial No. 08/632,742 filed April 16, 
1996 which is incorporated by reference herein. 

[0058] In a preferred embodiment of the invention, the reactive sequencing method utilizes a 
two cycle system. An exonuclease-deficient polymerase is used in the first cycle and a mixture of 
exonuclease-deficient and exonuclease-proficient enzymes are used in the second cycle. In the 
first cycle, the primer/template system together with an exonuclease-deficient polymerase will be 
presented sequentially with each of the four possible nucleotides. Reaction time and conditions 
will be such that a sufficient fraction of primers are extended to allow for detection and 
quantification of nucleotide incorporation, ~ 98%, for accurate quantification of multiple 
single-base repeats. In the second cycle, after identification of the correct nucleotide, a mixture 
of exonuclease proficient and deficient polymerases, or a polymerase containing both types of 
activity will be added in a second cycle together with the correct dNTP identified in the first 
cycle to complete and proofread the primer extension. In this way, an exonuclease-proficient 
polymerase is only present in the reaction cell when the correct dNTP is present, so that 
exonucleolytic degradation of correctly extended strands does not occur, while degradation and 
correct re-extension of previously incorrectly extended strands does occur, thus achieving 
extremely accurate strand extension. 
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[0059] The detection of a DNA polymerase mediated extension reaction can be accomplished 
in a number of ways. For example, the heat generated by the extension reaction can be measured 
using a variety of different techniques such as those employing thermopile, thermistor and 
refractive index measurements. 

[0060] In an embodiment of the invention, the heat generated by a DNA polymerase mediated 
extension reaction can be measured. For example, in a reaction cell volume of 100 
micrometers 3 containing 1 jig of water as the sole thermal mass and 2x1 0 11 DNA template 
molecules (300 fmol) tethered within the cell, the temperature of the water increases by lxl0 3 °C 
for a polymerase reaction which extends the primer by a single nucleoside monophosphate. This 
calculation is based on the experimental determination that a one base pair extension in a DNA 
chain is an exothermic reaction and the enthalpy change associated with this reaction is 3.5 
kcal/mole of base. Thus extension of 300 fmol of primer strands by a single base produces 300 
fmol x 3.5 kcal/mol or 1 x 10" 9 cal of heat. This is sufficient to raise the temperature of 1 ug of 
water by lx 10" 3 °C. Such a temperature change can be readily detectable using thermistors 
(sensitivity < 10- 4 °C); thermopiles (sensitivity <10- 5 °C); and refractive index measurements 
(sensitivity < 10" 6 ° C). 

[0061] In a specific embodiment of the invention, thermopiles may used to detect temperature 
changes. Such thermopiles are known to have a high sensitivity to temperature and can make 
measurements in the tens of micro-degree range in several second time constants. Thermopiles 
may be fabricated by constructing serial sets of junctions of two dissimilar metals and physically 
arranging the junctions so that alternating junctions are separated in space. One set of junctions 
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is maintained at a constant reference temperature, while the alternate set of junctions is located in 
the region whose temperature is to be sensed. A temperature difference between the two sets of 
junctions produces a potential difference across the junction set which is proportional to the 
temperature difference, to the thermoelectric coefficient of the junction and to the number of 
junctions. For optimum response, bimetallic pairs with a large thermoelectric coefficient are 
desirable, such as bismuth and antimony. Thermopiles may be fabricated using thin film 
deposition techniques in which evaporated metal vapor is deposited onto insulating substrates 
through specially fabricated masks. Thermopiles that may be used in the practice of the 
invention include thermopiles such as those described in U.S. Patent 4,935,345, which is 
incorporated by reference herein. 

[0062] In a specific embodiment of the invention, miniature thin film thermopiles produced by 
metal evaporation techniques, such as those described in U.S. Patent 4,935,345 incorporated 
herein by reference, may be used to detect the enthalpy changes. Such devices have been made 
by vacuum evaporation through masks of about 10 mm square. Using methods of 
photolithography, sputter etching and reverse lift-off techniques, devices as small as 2 mm 
square may be constructed without the aid of modern microlithographic techniques. These 
devices contain 150 thermoelectric junctions and employ 12 micron line widths and can measure 
the exothermic heat of reaction of enzyme-catalyzed reactions in flow streams where the enzyme 
is preferably immobilized on the surface of the thermopile. 

[0063] To incorporate thermopile detection technology into a reactive sequencing device, thin- 
film bismuth-antimony thermopiles 2, as shown in Figure 1, may be fabricated by successive 
electron-beam evaporation of bismuth and antimony metals through two different 
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photolithographically-generated masks in order to produce a zigzag array of alternating thin 
bismuth and antimony wires which are connected to form two sets of bismuth-antimony 
thermocouple junctions. Modern microlithographic techniques will allow fabrication of devices 
at least one order of magnitude smaller than those previously made, i.e., with line widths as small 
as lum and overall dimensions on the order of 100 urn 2 . One set of junctions 4 (the sensor 
junctions) is located within the reaction cell 6, i.e., deposited on a wall of the reaction cell, while 
the second reference set of junctions 8 is located outside the cell at a reference point whose 
temperature is kept constant. Any difference in temperature between the sensor junctions and 
the reference junctions results in an electric potential being generated across the device, which 
can be measured by a high-resolution digital voltmeter 10 connected to measurement points 12 
at either end of the device. It is not necessary that the temperature of the reaction cell and the 
reference junctions be the same in the absence of a polymerase reaction event, only that a change 
in the temperature of the sensor junctions due to a polymerase reaction event be detectable as a 
change in the voltage generated across the thermopile. 

[0064] In addition to thermopiles, as shown in Figure 2, a thermistor 14 may also be used to 
detect temperature changes in the reaction cell 6 resulting from DNA polymerase mediated 
incorporation of dNMPs into the DNA primer strand. Thermistors are semiconductors composed 
of a sintered mixture of metallic oxides such as manganese, nickel, and cobalt oxides. This 
material has a large temperature coefficient of resistance, typically ~ 4% per °C, and so can sense 
extremely small temperature changes when the resistance is monitored with a stable, high- 
resolution resistance-measuring device such as a digital voltmeter, e.g., Keithley Instruments 
Model 2002. A thermistor 14, such as that depicted in Figure 2, may be fabricated in the reactive 
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sequencing reaction cell by sputter depositing a thin film of the active thermistor material onto 
the surface of the reaction cell from a single target consisting of hot pressed nickel, cobalt and 
manganese oxides. Metal interconnections 16 which extend out beyond the wall of the reaction 
cell may also be fabricated in a separate step so that the resistance of the thermistor may be 
measured using an external measuring device 18. 

[0065] Temperature changes may also be sensed using a refractive index measurement 
technique. For example, techniques such as those described in Bornhop (1995, Applied Optics 
34:3234-323) and U.S. Patent 5,325,170, may be used to detect refractive index changes for 
liquids in capillaries. In such a technique, a low-power He-Ne laser is aimed off-center at a right 
angle to a capillary and undergoes multiple internal reflection. Part of the beam travels through 
the liquid while the remainder reflects only off the external capillary wall. The two beams 
undergo different phase shifts depending on the refractive index difference between the liquid 
and capillary. The result is an interference pattern, with the fringe position extremely sensitive to 
temperature - induced refractive index changes. 

[0066] In a further embodiment of the invention, the thermal response of the system may be 
increased by the presence of inorganic pyrophosphatase enzyme which is contacted with the 
template system along with the dNTP solution. Additionally, heat is released as the 
pyrophosphate released from the dNTPs upon incorporation into the template system is 
hydrolyzed by inorganic pyrophosphatase enzyme. 

[0067] In another embodiment, the pyrophosphate released upon incorporation of dNTP's may 
be removed from the template system and hydrolyzed, and the resultant heat detected, using 
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thermopile, thermistor or refractive index methods, in a separate reaction cell downstream. In 
this reaction cell, inorganic pyrophosphatase enzyme may be mixed in solution with the dNTP 
removed from the DNA template system, or alternatively the inorganic pyrophosphatase enzyme 
may be covalently tethered to the wall of the reaction cell. 

[0068] Alternatively, the polymerase-catalyzed incorporation of a nucleotide base can be 
detected using fluorescence and chemiluminescence detection schemes. The DNA polymerase 
mediated extension is detected when a fluorescent or chemiluminescent signal is generated upon 
incorporation of a fluorescently or chemiluminescently labeled dNMP into the extending DNA 
primer strand. Such tags are attached to the nucleotide in such a way as to not interfere with the 
action of the polymerase. For example, the tag may be attached to the nucleotide base by a 
linker arm sufficiently long to move the bulky fluorophore away from the active site of the 
enzyme. 

[0069] For use of such detection schemes, nucleotide bases are labeled by covalently attaching 
a compound such that a fluorescent or chemiluminescent signal is generated following 
incorporation of a dNTP into the extending DNA primer/template. Examples of fluorescent 
compounds for labeling dNTPs include but are not limited to fluorescein, rhodamine, and 
BODIPY (4,4-difluoro-4-bora-3a,4a-diaza-s-indacene). See Handbook of Molecular Probes and 
Fluorescent Chemicals available from Molecular Probes, Inc. (Eugene, OR). Examples of 
chemiluminescence based compounds that may be used in the sequencing methods of the 
invention include but are not limited to luminol and dioxetanones (See, Gunderman and 
McCapra, "Chemiluminescence in Organic Chemistry", Springer- Verlag, Berlin Heidleberg, 
1987) 
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[0070] Fluorescently or chemiluminescently labeled dNTPs are added individually to a DNA 
template system containing template DNA annealed to the primer, DNA polymerase and the 
appropriate buffer conditions. After the reaction interval, the excess dNTP is removed and the 
system is probed to detect whether a fluorescent or chemiluminescent tagged nucleotide has been 
incorporated into the DNA template. Detection of the incorporated nucleotide can be 
accomplished using different methods that will depend on the type of tag utilized. 

[0071] For fluorescently-tagged dNTPs the DNA template system may be illuminated with 
optical radiation at a wavelength which is strongly absorbed by the tag entity. Fluorescence from 
the tag is detected using for example a photodetector together with an optical filter which 
excludes any scattered light at the excitation wavelength. 

[0072] Since labels on previously incorporated nucleotides would interfere with the signal 
generated by the most recently incorporated nucleotide, it is essential that the fluorescent tag be 
removed at the completion of each extension reaction. To facilitate removal of a fluorescent tag, 
the tag may be attached to the nucleotide via a chemically or photochemically cleavable linker 
using methods such as those described by Metzger, M.L. et al. ( 1994, Nucleic Acids Research 
22:4259-4267) and Burgess, K. et al., (1997, J. Org. Chem. 62:5165-5168) so that the 
fluorescent tag may be removed from the DNA template system before a new extension reaction 
is carried out. 

[0073] In a further embodiment utilizing fluorescent detection, the fluorescent tag is attached 
to the dNTP by a photocleavable or chemically cleavable linker, and the tag is detached 
following the extension reaction and removed from the template system into a detection cell 
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where the presence, and the amount, of the tag is determined by optical excitation at a suitable 
wavelength and detection of fluorescence. In this embodiment, the possibility of fluorescence 
quenching, due to the presence of multiple fluorescent tags immediately adjacent to one another 
on a primer strand which has been extended complementary to a single base repeat region in the 
template, is minimized, and the accuracy with which the repeat number can be determined is 
optimized. In addition, excitation of fluorescence in a separate chamber minimizes the 
possibility of photolytic damage to the DNA primer/template system. 

[0074] In an additional embodiment utilizing fluorescent detection, the signal from the 
fluorescent tag can be destroyed using a chemical reaction which specifically targets the 
fluorescent moiety and reacts to form a final product which is no longer fluorescent In this 
embodiment, the fluorescent tag attached to the nucleotide base is destroyed following extension 
and detection of the fluorescence signal, without the removal of the tag. In a specific 
embodiment, fluorophores attached to dNTP bases may be selectively destroyed by reaction with 
compounds capable of extracting an electron from the excited state of the fluorescent moiety 
thereby producing a radical ion of the fluorescent moiety which then reacts to form a final 
product which is no longer fluorescent. In a further specific embodiment, the signal from a 
fluorescent tag is destroyed by photochemical reaction with the cation of a diphenyliodonium 
salt following extension and detection of the fluorescence label. The fluorescent tag attached to 
the incorporated nucleotide base is destroyed, without removal of the tag, by the addition of a 
solution of a diphenyliodonium salt to the reaction cell and subsequent UV light exposure. The 
diphenyliodonium salt solution is removed and the reactive sequencing is continued. This 
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embodiment does not require dNTP's with chemically or photochemically cleavable linkers, 
since the fluorescent tag need not be removed. 

[0075] In a further embodiment of the technique, the response generated by a DNA 
polymerase-mediated extension reaction can be amplified. In this embodiment, the dNTP is 
chemically modified by the covalent attachment of a signaling tag through a linker that can be 
cleaved either chemically or photolytically. Following exposure of the dNTP to the 
primer/template system and flushing away any unincorporated chemically modified dNTP, any 
signaling tag that has been incorporated is detached by a chemical or photolytic reaction and 
flushed out of the reaction chamber to an amplification chamber in which an amplified signal 
may be produced and detected. 

[0076] A variety of methods may be used to produce an amplified signal. In one such method 
the signaling tag has a catalytic function. When the catalytic tag is cleaved and allowed to react 
with its substrate, many cycles of chemical reaction ensue producing many moles of product per 
mole of catalytic tag, with a corresponding multiplication of reaction enthalpy. Either the 
reaction product is detected, through some property such as color or absorbency, or the amplified 
heat product is detected by a thermal sensor. For example, if an enzyme is covalently attached to 
the dNTP via a cleavable linker arm of sufficient length that the enzyme does not interfere with 
the active site of the polymerase enzyme. Following incorporation onto the DNA primer strand, 
that enzyme is detached and transported to a second reactor volume in which it is allowed to 
interact with its specific substrate, thus an amplified response is obtained as each enzyme 
molecule carries out many cycles of reaction. For example, the enzyme catalase (CAT) catalyzes 
the reaction: 
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CAT 

H 2 02 H2O + V2O2 + ~100kJ/mol Heat 

if each dNTP is tagged with a catalase molecule which is detached after dNMP incorporation and 
allowed to react downstream with hydrogen peroxide, each nucleotide incorporation would 
generate ~ 25 kcal/mol x N of heat where N is the number of hydrogen peroxide molecules 
decomposed by the catalase. The heat of decomposition of hydrogen peroxide is already ~ 6-8 
times greater than for nucleotide incorporation, (i.e. 3.5 - 4 kcal/mol). For decomposition of ~ 
100 - 150 hydrogen peroxide molecules the amount of heat generated per base incorporation 
approaches 1000 times that of the unamplified reaction. Similarly, enzymes which produce 
colored products, such as those commonly used in enzyme-linked immunosorbent assays 
(ELISA) could be incorporated as detachable tags. For example the enzyme alkaline 
phosphatase converts colorless p-nitrophenyl phosphate to a colored product (p-nitrophenol); the 
enzyme horseradish peroxidase converts colorless o-phenylenediamine hydrochloride to an 
orange product. Chemistries for linking these enzymes to proteins such as antibodies are 
well-known to those versed in the art, and could be adapted to link the enzymes to nucleotide 
bases via linker arms that maintain the enzymes at a distance from the active site of the 
polymerase enzymes. 

[0077] In a further embodiment, an amplified thermal signal may be produced when the 
signaling tag is an entity which can stimulate an active response in cells which are attached to, or 
held in the vicinity of, a thermal sensor such as a thermopile or thermistor. Pizziconi and Page 
(1997, Biosensors and Bioelectronics 12:457-466) reported that harvested and cultured mast cell 
populations could be activated by calcium ionophore to undergo exocytosis to release histamine, 



NY02:34442L1 



31 



31734 PCT-USA-A 



up to 1 0 - 30 pg (100 - 300 finol) per cell The multiple cell reactions leading to exocytosis are 
themselves exothermic. This process is further amplified using the enzymes diamine oxidase to 
oxidize the histamine to hydrogen peroxide and imidazoleacetaldehyde, and catalase to 
disproportionate the hydrogen peroxide. Two reactions together liberate over 100 kJ of heat per 
mole of histamine. For example, a calcium ionophore is covalently attached to the dNTP base 
via a linker arm which distances the linked calcium ionophore from the active site of the 
polymerase enzyme and is chemically or photochemically cleavable. Following the DNA 
polymerase catalyzed incorporation step, and flushing away unincorporated nucleotides any 
calcium ionophore remaining bound to an incorporated nucleotide may be cleaved and flushed 
downstream to a detection chamber containing a mast cell-based sensor such as described by 
Pizziconi and Page (1997, Biosensors and Bioelectronics 12:457-466). The calcium ionophore 
would bind to receptors on the mast cells stimulating histamine release with the accompanying 
generation of heat. The heat production could be further amplified by introducing the enzymes 
diamine oxidase to oxidize the histamine to hydrogen peroxide and imidazoleacetaldehyde, and 
catalase to disproportionate the hydrogen peroxide. Thus a significantly amplified heat signal 
would be produced which could readily be detected by a thermopile or thermistor sensor within, 
or in contact with, the reaction chamber. 

[0078] In a further embodiment utilizing chemiluminescent detection, the chemiluminescent 
tag is attached to the dNTP by a photocleavable or chemically cleavable linker. The tag is 
detached following the extension reaction and removed from the template system into a detection 
cell where the presence, and the amount, of the tag is determined by an appropriate chemical 
reaction and sensitive optical detection of the light produced. In this embodiment, the 
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possibility of a non-linear optical response due to the presence of multiple chemiluminescent 
tags immediately adjacent to one another on a primer strand which has been extended 
complementary to a single base repeat region in the template, is minimized, and the accuracy 
with which the repeat number can be determined is optimized. In addition, generation of 
chemiluminescence in a separate chamber minimizes chemical damage to the DNA 
primer/template system, and allows detection under harsh chemical conditions which otherwise 
would chemically damage the DNA primer/template. In this way, chemiluminescent tags can be 
chosen to optimize chemiluminescence reaction speed, or compatibility of the tagged dNTP with 
the polymerase enzyme, without regard to the compatibility of the chemiluminescence reaction 
conditions with the DNA primer/template. 

[0079] In a further embodiment of the invention, the concentration of the dNTP solution 
removed from the template system following each extension reaction can be measured by 
detecting a change in UV absorption due to a change in the concentration of dNTPs, or a change 
in fluorescence response of fluorescently-tagged dNTPs. The incorporation of nucleotides into 
the extended template would result in a decreased concentration of nucleotides removed from 
the template system. Such a change could be detected by measuring the UV absorption of the 
buffer removed from the template system following each extension cycle. 

[0080] In a further embodiment of the invention, extension of the primer strand may be sensed 
by a device capable of sensing fluorescence from, or resolving an image of, a single DNA 
molecule. Devices capable of sensing fluorescence from a single molecule include the confocal 
microscope and the near-field optical microscope. Devices capable of resolving an image of a 
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single molecule include the scanning tunneling microscope (STM) and the atomic force 
microscope (AFM). 

[0081] In this embodiment of the invention, a single DNA template molecule with attached 
primer is immobilized on a surface and viewed with an optical microscope or an STM or AFM 
before and after exposure to buffer solution containing a single type of dNTP, together with 
polymerase enzyme and other necessary electrolytes. When an optical microscope is used, the 
single molecule is exposed serially to fluorescently-tagged dNTP solutions and as before 
incorporation is sensed by detecting the fluorescent tag after excess unreacted dNTP is removed. 
Again as before, the incorporated fluorescent tag must be cleaved and discarded before a 
subsequent tag can be detected. Using the STM or AFM, the change in length of the primer 
strand is imaged to detect incorporation of the dNTP. Alternatively the dNTP may be tagged 
with a physically bulky molecule, more readily visible in the STM or AFM., and this bulky tag is 
removed and discarded before each fresh incorporation reaction. 

[0082] When sequencing a single molecular template in this way, the possibility of incomplete 
reaction producing erroneous signal and out-of-phase strand extension, does not exist and the 
consequent limitations on read length do not apply. For a single molecular template, reaction 
either occurs or it does not, and if it does not, then extension either ceases and is known to cease, 
or correct extension occurs in a subsequent cycle with the correct dNTP. In the event that an 
incorrect nucleotide is incorporated, which has the same probability as more the multiple strand 
processes discussed earlier, for example 1 in 1,000, an error is recorded in the sequence, but this 
error does not propagate or affect subsequent readout and so the read length is not limited by 
incorrect incorporation. 
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[0083] In the reactive sequencing process, extension failures will typically arise due to the 
kinetics of the extension reaction and limitations on the amount of time allotted for each 
extension trial with the single deoxynucleotide triphosphates (dNTP's). When reaction is 
terminated by flushing away the dNTP supply, some small fraction of the primer strands may 
remain unextended. These strands on subsequent dNTP reaction cycles will continue to extend 
but will be out of phase with the majority strands, giving rise to small out-of-phase signals (i.e. 
signaling a positive incorporation for an added dNTP which is incorrect for extension of the 
majority strands). Because extension failure can occur, statistically, on any extension event, 
these out-of-phase signals will increase as the population of strands with extension failures 
grows. Ultimately the out-of-phase signal becomes comparable in amplitude with the signal due 
to correct extension of the majority strands and the sequence may be unreadable. The length by 
which the primer has been extended when the sequence becomes unreadable is known as the 
sequencing read length. 

[0084] The present invention relates to a method that can extend the sequencing read length in 
two ways, first, by discriminating between the in-phase and out-of-phase signals, and second by 
calculating where, and how, a dNTP probe sequence can be altered so as selectively to extend the 
out-of-phase strands to bring them back into phase with the majority strands. 

[0085] Specifically, a method is provided for discriminating between the in-phase and out-of- 
phase sequencing signals comprising: 
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(i) detecting and measuring error signals thereby determining the size of 
the trailing strand population; 

(ii) between the 3' terminus of the trailing strand primers and the 3' 
terminus of the leading strand primers; 

(iii) simulating the occurrence of an extension failure at a point upstream 
from the 3 ? terminus of the leading strands thereby predicting at each 
extension step the exact point in the sequence previously traversed by 
the leading strands to which the 3' termini of the trailing strands have 
been extended 

(iv) predicting for each dNTP introduced the signal to be expected from 
correct extension of the trailing strands; and 

(v) subtracting the predicted signal from the measured signal to yield a 
signal due only to correct extension of the leading strand population. 

(vi) 

[0086] "Upstream" refers to the known sequence of bases correctly incorporated onto the 
primer strands. "Downstream" refers to the sequence beyond the 3 ! terminus. Thus for the 
leading strand population the downstream sequence is unknown but is predetermined by the 
sequence of the template strand that has not yet been read; for the trailing strand population, the 
downstream sequence is known for the gap between the 3 ! termini of the trailing and leading 
strands. 

[0087] The gap between the leading and trailing primer strands may be 1, 2 or 3 bases (where a 
single base repeat of any length, e.g. AAAA, is counted as a single base because the entire repeat 
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will be traversed in a single reaction cycle if the correct dNTP is introduced), but can never 
exceed 3 bases nor shrink spontaneously to zero if the reaction cycle of the four dNTP's is 
unchanged and no other reaction errors occur, for example a second extension failure on the 
same primer strand. If the reaction cycle of the four dNTP's is unchanged, it may readily be 
understood that a primer strand which has failed to extend when the correct dNTP, for example 
dATP, is in the reaction chamber cannot trail the leading (majority) strands (which did extend) 
by more than 3 bases, because the fourth base in the dNTP reaction cycle will always once again 
be the correct base (dATP) for the strand which failed to extend previously. Similarly, a trailing 
strand resulting from an extension failure can never re-synchronize with the leading strands if 
extension subsequently proceeds correctly, because the leading strands will always have 
extended by at least one more nucleotide - G, T, or C in the example discussion of an A 
extension failure - before the trailing strand can add the missing A. The effect is that after each 
complete dNTP cycle the trailing strands always follow the leading strands by an extension 
amount that represents the bases added in one complete dNTP cycle at a given point in the 
sequence. A further consequence is that all trailing strands that have undergone a single failure 
are in phase with each other regardless of the point at which the extension failure occurred. 

[0088] The methods described herein may be utilized to significantly extend the read length 
that can be achieved by the technique of reactive sequencing by providing a high level of 
immunity to erroneous signals arising from extension failure. In a preferred embodiment of the 
invention, the discrimination method of the invention is computer based. 

[0089] First, determination of the readout signals allows real-time discrimination between the 
signals due to correct extension of the leading strand population and error signals arising from 
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extension of the population of trailing strands resulting from extension failure. Using this 
information, accurate sequence readout can be obtained significantly beyond the point at which 
the trailing strand signals would begin to mask the correct leading strand signals. In fact, because 
the trailing strand signals can always be distinguished from the leading strand signals, it is 
possible to allow the trailing strand population to continue to grow, at the expense of the leading 
strands, to the point where the sequence is read from the signals generated on the trailing strand 
population, and the leading strand signals are treated as error signals to be corrected for. 
Ultimately, as the probability that a primer strand will have undergone at least one extension 
failure approaches unity, the signals from the leading strand population will disappear. 
Correspondingly the probability will increase that a trailing strand will undergo a second 
extension failure; the signals from this second population of double failure strands can be 
monitored and the single failure strand signals corrected in just the same way as the zero failure 
strand signals were corrected for signals due to single failure strands. 

[0090] Second, because knowledge of the leading strand sequence permits one to know the 
point to which the trailing strands have advanced, by simulating the effect of an extension failure 
on that known sequence in a computer, and also to know the sequence in the 1, 2 or 3 base gap 
between these strands and the leading strands, then for a given template sequence the dNTP 
probe cycle can be altered at any point to selectively extend the trailing strands while not 
extending the leading strands, thereby resynchronizing the populations. Alternatively the gap 
between leading and trailing strands can be simulated in the computer and the gap can be 
eliminated by reversing the dNTP cycle whenever the gap shrinks to a single base. These 
processes are referred to as "healing." If a large number of different sequences are being read in 
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parallel with the same dNTP reagents, an altered dNTP probe cycle that is correct for healing 
extension failure strands on a given sequence may not be correct for healing other sequences. 
However, with a large enough number of parallel sequence readouts, roughly one-third of the 
sequences will have trailing strands with a 1-base gap at any point, and so reversal of the dNTP 
probe cycle at arbitrary intervals will heal roughly one-third of the readouts with extension 
failure gaps. Repeated arbitrary reversal of the dNTP probe cycle eventually heals roughly two- 
thirds of all the readouts. The overall effect of these error correction and error elimination 
processes is to reduce, or eliminate any limitation on read length arising from extension failure. 

[0091] The ability to overcome the read length limitations imposed by extension failure 
provides significant additional flexibility in experimental design. For example, it may be that 
read length is not limited by extension failure, but rather by misincorporation of incorrect 
nucleotides, which shuts down extension on the affected strands and steadily reduces the signal, 
ultimately to the point where it is not detectable with the desired accuracy. In this case, the 
ability to eliminate the effects of extension failure allows the experimenter great flexibility to 
alter the reaction conditions in such a way that misincorporation is minimized, at the expense of 
an increased incidence of extension failure. Misincorporation frequency depends in part on the 
concentration of the probing dNTP's and the reaction time allowed. Longer reaction times, or 
higher dNTP concentrations result in an increased probability of misincorporation, but a reduced 
incidence of extension failure. Therefore, if a higher level of extension failure can be tolerated 
due to, for example, the computer-aided signal discrimination and dNTP cycle-reversal healing 
methods, then reaction times and/or dNTP reagent concentrations can be reduced to minimize 
misincorporation, with the resulting increase in extension failure being countered by the 
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computer-aided signal discrimination and/or dNTP cycle-reversal healing techniques described 
above. 

[0092] If the deoxyribonucleotides used for the polymerase reaction are impure a small 
fraction of strands will extend when the main nucleotide is incorrect and produce a population of 
leading, rather than trailing, error strands. As with the trailing strands, the leading strand 
population is never more than three bases, nor less than one base, ahead of the main population, 
unless a second error occurs on the same strand, and also, regardless of where an incorrect 
extension by an impurity dNTP occurs, the leading strands are all in phase with each other. A 
given base site can be probed either 1, 2 or 3 times with an incorrect dNTP before it must be 
extended by the correct dNTP, so on the average twice. If each of the incorrect dNTP's is 
assumed to carry the same percentage of dNTP impurity, then the probability of incorrect 
extension by, e.g. 99% pure dNTP containing the correct complementary base as an impurity is 
1% -5- 3 (only 1/3 of the impurity will be the correct complementary base) x 2 (average 2 
incorrect trials between each correct extension), that is, 0.67%. 

[0093] As with trailing strands, the leading strand population can produce out-of-phase 
extension signals that complicate the readout of the majority strand sequence, as shown in Figure 
15. Because the sequence downstream of the 3' terminus of the majority strands is not known at 
the time of extension of those strands, the signal due to leading strand extension can not 
immediately be corrected for, nor can an altered dNTP cycle be calculated which would 
automatically heal the gap between majority and leading strands for a given template sequence. 
However similar methods can be used to ameliorate the effects of a leading strand population. 
First, as with trailing strands, reversal of the dNTP probe cycle automatically heals the gap 
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between leading and majority strand populations whenever the gap shrinks to a single base. 
Therefore, arbitrary reversal of the dNTP probe cycle has a 1/3 probability of healing the gap for 
a given sequence, or will heal 1/3 of the sequences in a large population of sequences probed in 
parallel. Continued arbitrary reversal eventually heals roughly two-thirds of such gaps. Second, 
although the sequence downstream of the 3 ! terminus of the majority strands is not immediately 
known, information about this sequence becomes available as soon as the majority strands 
traverse the gap region. Therefore, for each extension of the majority strands it is possible, 
ideally using a computer simulation, to calculate when the leading strand population would have 
traversed that base and thus the signal by which a prior extension of the majority strands would 
have been contaminated. In this way the majority strand extension signals can retrospectively be 
corrected for leading strand signals. 

[0094] There are important aspects to leading strand creation that reduce the frequency of 
occurrence of leading strand events. First, if the concentration of impurity dNTP's is sufficiently 
low, a leading strand population cannot be created by impurity extension of the first base of a 
repeat. This is because the probability of incorrect incorporation of two impurity bases on the 
same strand in the same reaction cycle is the square of the probability for a single incorporation, 
and therefore vanishingly small for small impurity levels. Therefore, whenever the correct dNTP 
for extension of the repeat length is supplied, all strands will be extended to completion when the 
correct nucleotide is supplied, regardless of whether some fraction of the strands were already 
partially extended by one base of the repeat. Second, not all incorrect extensions result in a 
permanent phase difference. For a permanent phase difference to result, a second extension (by a 
correct base) must occur on the leading strand before the main strands extend to catch up to the 
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leading strand. Labeling the next four sites along the template sequence: 1, 2, 3, 4, then, by 
definition, if a leading strand is created by incorporation of an impurity base on site 1 while the 
majority of the strands do not extend, the main nucleotide supplied is incorrect for extension at 
site 1. If the main nucleotide supplied is correct for extension at site 2, a 2-base lead is created. 
There is 1 chance in 4 that the reaction chamber contains the correct nucleotide for site 2, so the 
probability of creating a 2-base extension in a single step (with an impurity extension followed 
by a correct extension) is 1/4 the probability of the impurity extension alone. For the 0.67% 
impurity extension probability cited above, this means a 0.16% probability of creating a 2-base 
extension in a single cycle. 

[0095] However, if the main nucleotide supplied is incorrect for further extension at site 2, 
and, by definition incorrect for extension at site 1, then for the lead to become fixed, the correct 
nucleotide for site 2 must be supplied before the correct nucleotide to extend at site 1. The 
probability that site 2 will extend before site 1 is therefore 50%; for a 0.67% impurity extension 
probability, the probability that this creates a fixed lead due to a second extension by a correct 
nucleotide is 0.33%. Overall, a 1% impurity level results in ~ 0.5% probability of creating a 
leading strand in any given reaction trial. 

[0096] Preparation of specific embodiments in accordance with the present invention will now 
be described in further detail. These examples are intended to be illustrative and the invention is 
not limited to the specific materials and methods set forth in these embodiments. 
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Example 1 

[0097] A microcalorimetic experiment was performed which demonstrates for the first time 
the successful thermal detection of a DNA polymerase reaction. The results are shown in Figure 
3. Approximately 20 units of T7 Sequenase was injected into a 3mL reaction volume containing 
approximately 20nmol of DNA template and complementary primer, and an excess of dNTPs. 
The primer was extended by 52-base pairs, the expected length given the size of the template. 
Using a commercial microcalorimeter (TAM Model 2273; Thermometries, Sweden) a reaction 
enthalpy of 3.5-4 kcal per mole of base was measured (Figure 3). This measurement is well 
within the value required for thermal detection of DNA polymerase activity. This measurement 
also demonstrates the sensitivity of thermopile detection as the maximum temperature rise in the 
reaction cell was lxlO- 3 C. The lower trace seen in Figure 3 is from a reference cell showing the 
injection artifact for an enzyme-free injection into buffer containing no template system. 

Example 2 

[0098] To illustrate the utility of mutant T4 polymerases, two primer extension assays were 
performed with two different mutant T4 polymerases, both of which are exonuclease deficient, 
hi one mutant, Asp 112 is replaced with Ala and Glull4 is replaced with Ala (D112A/E114A). 
The exonuclease activity of this mutant on double-stranded DNA is reduced by a factor of about 
300 relative to the wild type enzyme as described by Reha-Krantz and Nonay (1993, J. Biol. 
Chem. 268:27100-27108). In a second polymerase mutant, in addition to the D112A/E114A 
amino acid substitutions, a third substitution replaces Ile417 with Val (I417V/D112A/E114A). 
The 14 17V mutation increases the accuracy of synthesis by this polymerase (Stocki, S.A. and 
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Reha-Krantz, L. J, 1995, J Mol. Biol. 245:15-28;Reha-Krantz, L. J. and Nonay, R.L., 1994, J. 
Biol. Chem. 269:5635-5643) 

[0099] Two separate primer extension reactions were carried out using each of the polymerase 
mutants. In the first, only a single correct nucleotide, dGTP, corresponding to a template C was 
added. The next unpaired template site is a G so that misincorporation would result in formation 
of a G # G mispair. A G»G mispair tends to be among the most difficult mispairs for polymerases 
to make. In the second primer extension reaction, two nucleotides, dGTP and dCTP, 
complementary to the first three unpaired template sites were added. Following correct 
incorporation of dGMP and dCMP, the next available template site is a T. Formation of OT 
mispairs tend to be very difficult while G»T mispairs tend to be the most frequent mispairs made 
by polymerases. 

[00100] Time courses for primer extension reactions by both mutant T4 polymerases are shown 
in Figure 4. Low concentrations of T4 polymerase relative to primer/template (p/t) were used so 
that incorporation reactions could be measured on convenient time scales (60 min). By 64 
minutes 98% of the primers were extended. In reactions containing only dGTP, both 
polymerases nearly completely extended primer ends by dGMP without any detectable 
incorporation of dGMP opposite G. In reactions containing both dGMP and dCMP, both 
polymerases nearly completely extended primer ends by addition of one dGMP and two dCMP's. 
A small percentage (*1%) of misincorporation was detectable in the reaction catalyzed by the 
D112A/E114A mutant. Significantly, no detectable misincorporation was seen in the reaction 
catalyzed by the I417V/D1 12A/E1 14A mutant. 
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Example 3 

[00101] In accordance with the invention a fluorescent tag may be attached to the nucleotide 
base at a site other than the 3 ! position of the sugar moiety. Chemistries for such tags which do 
not interfere with the activity of the DNA polymerase have been developed as described by 
Goodwin et al. (1995, Experimental Technique of Physics 41:279-294). Generally the tag is 
attached to the base by a linker arm of sufficient length to move the bulky tag out of the active 
site of the enzyme during incorporation. 

[00102] As illustrated in Figure 5, a nucleotide can be connected to a fluorophore by a 
photocleavable linker, e.g., a benzoin ester. After the tagged dNMP is incorporated onto the 3 1 
end of the DNA primer strand, the DNA template system is illuminated by light at a wave length 
corresponding to the absorption maximum of the fluorophore and the presence of the fluorophore 
is signaled by detection of fluorescence at the emission maximum of the fluorophore. Following 
detection of the fluorophore, the linker may be photocleaved to produce compound 2; the result 
is an elongated DNA molecule with a modified but non-fluorescent nucleotide attached. Many 
fluorophores, including for example, a dansyl group or acridine, etc., will be employed in the 
methodology illustrated by Figure 5. 

[00103] Alternatively, the DNA template system is not illuminated to stimulate fluorescence. 
Instead, the photocleavage reaction is carried out to produce compound 2 releasing the 
fluorophore, which is removed from the template system into a separate detection chamber. 
There the presence of the fluorophore is detected as before, by illumination at the absorption 
maximum of the fluorophore and detection of emission near the emission maximum of the 
fluorophore. 
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Example 4 

[00104] In a specific embodiment of the invention, a linked system consisting of a 
chemiluminescently tagged dNTP can consist of a chemiluminescent group (the dioxetane 
portion of compound 4), a chemically cleavable linker (the silyl ether), and an optional 
photocleavable group (the benzoin ester) as depicted in Figure 6. The cleavage of the silyl ether 
by a fluoride ion produces detectable chemiluminescence as described in Schaap et al. (1991, 
"Chemical and Enzymatic Triggering of 1, 2-dioxetanes: Structural Effects on 
Chemiluminescence Efficiency" in Bioluminescence & Chemiluminescence, Stanley, P.E. and 
Knicha, L.J. (Eds), Wiley, N.Y. 1991, pp. 103-106). In addition, the benzoin ester that links the 
nucleoside triphosphate to the silyl linker is photocleavable as set forth in Rock and Chan (1996, 
J. Org. Chem. 61: 1526-1529); and Felder, et al. (1997, First International Electronic Conference 
on Synthetic Organic Chemistry, Sept. 1-30). Having both a chemiluminescent tag and a 
photocleavable linker is not always necessary; the silyl ether can be attached directly to the 
nucleotide base and the chemiluminescent tag is destroyed as it is read. 

[00105] As illustrated in Figure 6 with respect to compound 3, treatment with fluoride ion 
liberates the phenolate ion of the adamantyl dioxetane, which is known to chemiluminesce with 
high efficiency (Bronstein et al., 1991, "Novel Chemiluminescent Adamantyl 1, 2-dioxetane 
Enzyme Substrates," in Bioluminescence & Chemiluminescence, Stanley, P.E. and Kricka, R.J. 
(eds), Wiley, N.Y. 1991 pp. 73-82). The other product of the reaction is compound 4, which is 
no longer chemiluminescent. Compound 4 upon photolysis at 308-366 nm liberates compound 
2. 
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[00106] The synthesis of compound 1 is achieved by attachment of the fluorophore to the 
carboxyl group of the benzoin, whose a- keto hydroxyl group is protected by 
9-fluorenylmethoxycarbonyl (FMOC), followed by removal of the FMOC protecting group and 
coupling to the nucleotide bearing an activated carbonic acid derivative at its 3' end. Compound 
4 is prepared via coupling of the vinyl ether form of the adamantyl phenol, to chloro(3- 
cyanopropyl)dimethylsilane, reduction of the cyano group to the amine, generation of the 
oxetane, and coupling of this chemiluminescence precursor to the nucleotide bearing an activated 
carbonic acid derivative at its 3' end. 

[00107] The chemiluminescent tag can also be attached to the dNTP by a cleavable linkage and 
cleaved prior to detection of chemiluminescence. As shown in Figure 7, the benzoin ester 
linkage in compound 3 may be cleaved photolytically to produce the free chemiluminescent 
compound 5. Reaction of compound 5 with fluoride ion to generate chemiluminescence may 
then be carried out after compound 5 has been flushed away from the DNA template primer in 
the reaction chamber. As an alternative to photolytic cleavage, the tag may be attached by a 
chemically cleavable linker which is cleaved by chemical processing which does not trigger the 
chemiluminescent reaction. 

Example 5 

[00108] In this example, the nucleotide sequence of a template molecule comprising a portion 
of DNA of unknown sequence is determined. The DNA of unknown sequence is cloned into a 
single stranded vector such as Ml 3. A primer that is complementary to a single stranded region 
of the vector immediately upstream of the foreign DNA is annealed to the vector and used to 
prime synthesis in reactive sequencing. For the annealing reaction, equal molar ratios of primer 
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and template (calculated based on the approximation that one base contributes 330 g/mol to the 
molecular weight of a DNA polymer) is mixed in a buffer consisting of 67 mM TrisHCl pH 8.8, 
16.7 mM (NH 4 )2SC>4, and 0.5 mM EDTA. This buffer is suitable both for annealing DNA and 
subsequent polymerase extension reactions. Annealing is accomplished by heating the DNA 
sample in buffer to 80°C and allowing it to slowly cool to room temperature. Samples are 
briefly spun in a microcentrifuge to remove condensation from the lid and walls of the tube. To 
the DNA is added 0.2 mol equivalents of T4 polymerase mutant I417V/D112A/E114A and 
buffer components so that the final reaction cell contains 67 mM TrisHCl pH 8.8, 16.7 mM 
(NH4)2S04, 6.7 mM MgCl2 and 0.5 mM dithiothreitol. The polymerase is then queried with 
one dNTP at a time at a final concentration of lOyM. The nucleotide is incubated with 
polymerase at 37°C for 10s. Incorporation of dNTPs may be detected by one of the methods 
described above including measuring fluorescence, chemiluminescence or temperature change. 
The reaction cycle will be repeated with each of the four dNTPs until the complete sequence of 
the DNA molecule has been determined. 

Example 6 

[00109] Figure 7 illustrates a mechanical fluorescent sequencing method in accordance with the 
invention. A DNA template and primer are captured onto beads 18 using, for example, 
avidin-biotin or -NH2/n-hydroxysuccinimide chemistry and loaded behind a porous frit or filter 
20 at the tip of a micropipette 22 or other aspiration device as shown in Figure 7(a), step 1. 
Exonuclease deficient polymerase enzyme is added and the pipette tip is lowered into a small 
reservoir 24 containing a solution of fluorescently-labeled dNTP. As illustrated in step 2 of 
Figure 7(a), a small quantity of dNTP solution is aspirated through the filter and allowed to react 
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with the immobilized DNA. The dNTP solution also contains approximately 100 nM polymerase 
enzyme, sufficient to replenish rinsing losses. After reaction, as shown in step 3, the excess 
dNTP solution 24 is forced back out through the frit 20 into the dNTP reservoir 24. In step 4 of 
the process the pipette is moved to a reservoir containing buffer solution and several aliquots of 
buffer solution are aspirated through the frit to rinse excess unbound dNTP from the beads. The 
buffer inside the pipette is then forced out and discarded to waste 26. The pipette is moved to a 
second buffer reservoir (buffer 2), containing the chemicals required to cleave the fluorescent tag 
from the incorporated dNMP. The reaction is allowed to occur to cleave the tag. As shown in 
step 5 the bead/buffer slurry with the detached fluorescent tag in solution is irradiated by a laser 
or light source 28 at a wavelength chosen to excite the fluorescent tag, the fluorescence is 
detected by fluorescence detector 30 and quantified if incorporation has occurred. 

[00110] Subsequent steps depend on the enzyme strategy used. If a single-stage strategy with an 
exonuclease-deficient polymerase is used, as illustrated in Figure 7(b), the solution containing 
the detached fluorescent tag is discarded to waste (step 6) which is expelled, followed by a 
further rinse step with buffer 1 (step 7) which is thereafter discarded (step 8) and the pipette is 
moved to a second reservoir containing a different dNTP (step 9) and the process repeats starting 
from step 3, cycling through all four dNTPs. 

[00111] In a two-stage strategy, after the correct dNTP has been identified and the repeat length 
quantified in step 5, the reaction mixture is rinsed as shown in steps 6, 7, and 8 of Figure 7(b) 
and the pipette is returned to a different reservoir containing the same dNTP (e.g., dNTPl) as 
shown in step (a) of Figure 8 to which a quantity of exonuclease-proficient polymerase has been 
added and the solution is aspirated for a further stage of reaction which proof-reads the prior 
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extension and correctly completes the extension. This second batch of dNTP need not be 
fluorescently tagged, as the identity of the dNTP is known and no sequence information will be 
gained in this proof-reading step. If a tagged dNTP is used, the fluorescent tag is preferably 
cleaved and discarded as in step 5 of Figure 7(a) using Buffer 2, Alternatively, the initial 
incorporation reaction shown in step 2 of Figure 7(a) is carried out for long enough, and the 
initial polymerase is accurate enough, so that the additional amount of fluorescent tag 
incorporated with dNTPl at step a of Figure 8 is small and does not interfere with quantification 
of the subsequent dNTP. Following proof-reading in step a of Figure 8, excess dNTP is expelled 
(step b) and the reaction mixture is rinsed (steps c, d) with a high-salt buffer to dissociate the 
exo+ polymerase from the DNA primer/template. It is important not to have 
exonuclease-proficient enzyme present if the DNA primer/template is exposed to an incorrect 
dNTP. The pipette is then moved to step e, in which the reservoir contains a different dNTP, and 
the process is repeated, again cycling through all four dNTPs. 

Example 7 

[00112] A new process for destruction of a fluorophore signal which involves reaction of the 
electronically excited fluorophore with an electron-abstracting species, such as 
diphenyliodonium salts, is described. 

[00113] The reaction of a diphenyliodonium ion with an electronically excited fluorescein 
molecule is illustrated in Figure 10. The diphenyliodonium ion extracts an electron from the 
excited state of the fluorescein molecule producing a radical ion of the fluorescein molecule and 
a neutral diphenyliodonium free radical. The diphenyliodonium free radical rapidly decomposes 
to iodobenzene and a phenyl radical. The fluorescein radical ion then either reacts with the 
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phenyl radical or undergoes an internal arrangement to produce a final product which is no 
longer fluorescent. 

[00114] Figures 11 and 12 demonstrate evidence for the specific destruction of fluorescein by 
diphenylionium ion. hi Figure 1 1 , fluorescence spectra are presented for a mixture of fluorescein 
and tetramethylrhodamine dyes, before and after addition of a solution of diphenyliodonium 
chloride. It is seen that the fluorescence from the fluorescein dye is immediately quenched, 
demonstrating electron abstraction from the excited state of the molecule while the fluorescence 
from the rhodamine is unaffected, apart from a small decrease due to the dilution of the dye 
solution by the added diphenyliodonium chloride solution. 

[00115] Elimination of the fluorescent signal from the fluorescein dye by diphenyliodonium 
chloride is not in itself proof that the fluorescein molecule has been destroyed, because electron 
abstraction from the excited state of fluorescein effectively quenches the fluorescence, and 
quenching need not result in destruction of the fluorescein molecule. However, Figure 12 
demonstrates that the fluorescein molecule is destroyed by reaction with the diphenyhodonium 
and not simply quenched. Figure 12 demonstrates the ultraviolet (UV) absorption spectra for a 
fluorescein solution before and after addition of a solution of diphenyhodonium chloride. 
Spectrum 1 is the UV absorption spectrum of a pure fluorescein solution. Spectrum 2 is the UV 
absorption of the fluorescein solution following the addition of a solution containing a molar 
excess of diphenyliodonium (DPI) chloride and exposure to a single flash from a xenon camera 
strobe. The data show that fluorescein is essentially destroyed by the photochemical reaction 
with the DPI ion. Figure 12 provides clear evidence that diphenyliodonium chloride not only 
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quenches the fluorescence from the fluorescein dye but destroys the molecule to such an extent 
that it can no longer act as a fluorophore. 

[00116] An experiment was performed to demonstrate efficient fluorescent detection and 
destruction of fluorophore using a template sequence. The template, synthesized with a 
alkylamino linker at the 5' terminus, was: 

3 , -H 2 N-(CH 2 ) 7 -GAC CAT TAT AGG TCT TGT T AG GGA AAG GAA GA-5' 

[00117] The trial sequence to be determined is: G GGA AAG GAA GA. 

[00118] A tetramethyrhodamine-labeled primer sequence was synthesized to be complementary 

to the template as follows: 

5'-[Rhodamine]-(CH 2 )6-CTG GTA ATA TCC AGA ACA AT-3' 
[00119] The alkylamino-terminated template molecules were chemically linked to Sepharose 
beads derivatized with N-hydroxysuccinimide and the rhodamine-labeled primer was annealed to 
the template. The beads with attached DNA template and annealed primer were loaded behind a 
B-100 disposable filter in a 5-ml syringe. A volume containing a mixture of fluorescein-labeled 
and unlabeled dCTP in a ratio of 1 :2 and exonuclease-deficient polymerase enzyme in a reaction 
buffer as specified by the manufacturer was drawn into the syringe. Reaction was allowed to 
proceed for 20 minutes, at 35°C. After the reaction, the fluid was forced out of the syringe, 
retaining the beads with the reacted DNA behind the filter, and three washes with double- 
distilled water were performed by drawing water through the filter into the syringe and expelling 
it. The beads were resuspended in phosphate buffer, the filter was removed and the suspension 
was dispensed into a cuvette for fluorescence analysis. Following fluorescence analysis, the 
bead suspension was loaded back into the syringe which was then fitted with a filter tip, and the 

NY02:344421.1 52 



31734 PCT-USA-A 



phosphate buffer was dispensed. A solution of DPI was drawn up into the syringe with a 
concentration calculated to be in 1:1 molar equivalence to the theoretical amount of DNA 
template, the filter was removed and the bead suspension was dispensed into a cuvette for UV 
light exposure for 15 minutes. The suspension was recollected into a syringe, the filter was 
reattached, the DPI solution was expelled, and the beads were resuspended by drawing up 0.7 
mL of phosphate buffer. After removal of the filter the bead suspension was dispensed into a 
clean cuvette for fluorescence analysis to check the completeness of destruction of the 
fluorescein by the reaction with the DPI. A subsequent polymerase reaction was performed 
using the same protocol with labeled dTTP and similarly measured for fluorescence. 

[00120] Figure 13 demonstrates the results of the polymerase reactions, with photochemical 
destruction of the fluorescein label by DPI following each nucleotide incorporation reaction. 
Curve 1 shows rhodamine fluorescence following annealing of the rhodamine labeled primer to 
the beads, demonstrating covalent attachment of the template strands to the beads and capture of 
the rhodamine-labeled primer strands. Curve 2 demonstrates detection of fluorescein following 
polymerase-catalyzed incorporation of three partially fluorescein-labeled dCMPs onto the 3' 
terminus of the primer strands. Curve 3 shows complete destruction of the incorporated 
fluorescein label by photo-induced reaction with diphenyliodonium chloride. Loss of rhodamine 
signal here is attributed to loss of a significant fraction of the beads which stuck to the filter 
during washes. Curve 4 shows detection of a new fluorescein label following photochemical 
destruction of the fluorescein attached to the dCMP's and subsequent polymerase-catalyzed 
incorporation of three partially fluorescein-labeled dTMPs onto the 3' terminus of the primer 
strands. 
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[00121] The following methods were utilized to demonstrate successful destruction of a 
fluorescein-labeled dTMP. 

[00122] Sepharose beads were purchased from Amersham with surfaces derivatized with N- 
hydroxysuccinimide for reaction with primary amine groups. The alkylamino-terminated 
templates were chemically linked to the Sepharose beads using the standard procedure 
recommended by the manufacturer. 

[00123] The beads with attached template were suspended in 250 mM Tris buffer containing 
250 mM NaCl and 40 nM MgCl 2 . The solution containing the primer strands was added and the 
mixture heated to 80°C and cooled over ~ 2 hours to anneal the primers to the surface- 
immobilized DNA template strands. 

[00124] Fluorescein-labeled dUTP and dCTP were purchased from NEN Life Science Products. 
Unlabeled dTTP and dCTP were purchased from Amersham. 

[00125] Prior to any reaction, the annealed primer/template was subjected to fluorescence 
analysis to ensure that annealing had occurred. The excitation wavelength used was 320 nm and 
fluorescence from fluorescein and rhodamine was detected at -520 nm and -580 nm 
respectively. 

[00126] Reagent volumes were calculated on the assumption that the DNA template was 
attached to the beads with 100% efficiency. 

[00127] The 5X reaction buffer contained: 

1) 250 mM Tris buffer, pH 7.5 
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2) 250mMNaCl 

3) 40 mM MgCl 2 

4) lmg/mLBSA 

5) 25 mM dithiothreitol (DTT) 

mixed and brought to volume with double-distilled H 2 0 

[00128] T4 DNA polymerase was obtained from Worthington Biochemical Corp. The 
polymerase was dissolved in the polymerase buffer according to the manufacturer's protocols. 

[00129] Fluorescein-labeled and unlabeled dCTP's were mixed in a ratio of 1 :2. 

[00130] The reaction was run in a 5 mL syringe (Becton Dickinson) fitted with a B-100 
disposable filter (Upchurch Scientific). This limits the reaction volume to 5 mL total: 



Primer template suspension 0.7 mL 

T4 DNA Polymerase 1 .0 mL 

FdCTP/dCTP 0.040 mL 

5X reaction buffer 2.0 mL 

double-dist. H 2 0 1.0 mL 



[00131] The reaction was allowed to proceed in a 35°C oven for 20 minutes. Following 
reaction, the fluid was forced out of the syringe allowing the filter to retain the beads with the 
reacted DNA. Three washes with double-distilled water were performed. All waste was 
collected and saved for future reuse. The beads were resuspended in 0.7 mL of phosphate buffer, 
the filter was removed and the suspension was dispensed into a cuvette for fluorescence analysis. 

[00132] Following fluorescence analysis the bead suspension was collected into a 1 mL syringe 
(Becton Dickinson) which was then fitted with a filter tip. The phosphate buffer was dispensed 
and the waste collected. A solution of diphenyliodonium chloride (DPI) was drawn up with a 
concentration calculated to be in 1:1 molar equivalence to the theoretical amount of DNA 
template (i.e. DPI was present in excess of the incorporated fluorescein-labeled dCTP). The 
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filter was removed and the bead suspension with added DPI was dispensed into a cuvette and 
exposed to UV light for 15 minutes. The suspension was recollected into a syringe, the filter 
reattached, the DPI solution was dispensed and the beads were resuspended in 0.7 mL of 
phosphate buffer. The bead suspension was dispensed into a clean cuvette for fluorescence 
analysis. 

[00133] It should be noted that a significant fraction of the beads used in this procedure 
appeared to become stuck in the filter on the syringe. This resulted in a significant increase in 
the pressure needed to force fluids through the filter as it became clogged by the beads, and more 
importantly reduced the amount of DNA available for fluorescent detection of incorporated 
nucleotides and reduced the weak rhodamine signal from the labeled primer to the point where it 
was no longer detectable. 

[00134] Following the successful incorporation reaction with dCTP, a subsequent polymerase 
reaction was run to incorporate dTTP. The incorporated fluorescein-labeled dTMP was detected, 
but with significantly lower intensity due to the losses of the beads in the filter in the multiple 
transfer steps between the reaction syringe and the analysis cuvette. The lowered signal could 
also result in part from a different labeling efficiency of the dTTP and a different incorporation 
efficiency for the labeled nucleotide in the polymerase reaction. Because the rhodamine signal 
was no longer detectable following the second incorporation reaction it was not possible to 
correct for bead losses. 

[00135] The results are shown in Figure 13. The data represented by the curves were obtained 
sequentially as follows: 
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[00136] Curve 1 shows the rhodamine fluorescence following annealing of the rhodamine- 
labeled primer to the bead-immobilized DNA template 

[00137] Curve 2 demonstrates detection of the fluorescein-labeled dCTP following polymerase- 
catalyzed incorporation of three dCMFs onto the 3' terminus of the primer strands. 

[00138] Curve 3 demonstrates complete destruction of the incorporated fluorescein label on the 
dCMP's by photo-induced reaction with dipenyliodonium chloride. In this instance, the 
rhodamine label also has vanished; this is primarily because a significant fraction of the beads 
were lost by sticking in the filter used in the reagent flushing operation. It is possible that the 
rhodamine also was destroyed by the DPI photochemical reaction. 

[00139] Curve 4 demonstrates detection of a new fluorescein label following photochemical 
destruction of the fluorescein label on the dCMP's and polymerase-catalyzed incorporation of 
three fluorescein-tagged dTMP's onto the 3' terminus of the primer strands. The lower signal 
compared to curve 2 results mainly from the bead losses in the syringe, but may also reflect a 
lower incorporation efficiency of the dTMP and/or a lower labeling efficiency. Because the 
rhodamine signal from the labeled primer is no longer detectable, the bead losses cannot be 
calibrated. 

[00140] The results shown here demonstrate the concept of reactive sequencing by fluorescent 
detection of DNA extension followed by photochemical destruction of the fluorophore, which 
allows further extension and detection of a subsequent added fluorophore. This cycle can be 
repeated a large number of times if sample losses are avoided. In practical applications of this 
approach, such losses will be avoided by attaching the primer or template strands to the fixed 
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surface of an array device, for example a microscope slide, and transferring the entire array 
device between a reaction vessel and the fluorescent reader. 

Example 8 

[00141] Read length is defined as the maximum length of DNA sequence that can be read 
before uncertainties in the identities of the DNA bases exceed some defined level. In the reactive 
sequencing approach, read length is limited by two types of polymerase failures: 
misincorporation, i.e., incorrectly incorporating a noncomplementary base, and extension failure, 
i.e., failure to extend some fraction of the DNA primer strands on a given cycle in the presence 
of the correct complementary base. Example 2 demonstrated that reaction conditions can be 
optimized such that neither type of failure affects more than ~ 1% of the arrayed strands for any 
given incorporation reaction. Neither type of failure directly produces an error signal in the 
sequence readout, because neither a 1% positive signal, for a misincorporation, nor a 1% 
decrease in the signal for a correct incorporation, in the case of extension failure, will be 
significant compared to the signals anticipated for a correct incorporation. However, 
accumulated failures limit the read length in a variety of different ways. 

[00142] For example, misincorporation inhibits any further extension on the affected strand 
resulting in a reduction in subsequent signals. It is estimated that the probability of continuing to 
extend a given strand following a misincorporation is no greater than 0.1%, so that any 
contribution to the fluorescent signal resulting from misincorporation followed by subsequent 
extension of the error strand will be negligible. Instead, the accumulation of misincorporations 
resulting in inhibition of strand extension ultimately reduces the overall signal amplitude for 
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correct base incorporation to a level at which noise signals in the detection system begin to have 
a significant probability of producing a false signal that is read as a true base incorporation. 

[00143] Extension failures typically arise due to the kinetics of the extension reaction and 
limitations on the amount of time allotted for each extension trial with the single 
deoxynucleotide triphosphates (dNTP's). When reaction is terminated by flushing away the 
dNTP supply, a small fraction of the primer strands may remain unextended. These strands on 
subsequent dNTP reaction cycles will continue to extend but will be out of phase with the 
majority strands, giving rise to small out-of-phase signals, ie. 9 signaling a positive incorporation 
for an added dNTP which is incorrect for extension of the majority strands. Because extension 
failure can occur, statistically, on any extension event, the out-of-phase signals will increase as 
the population of strands with extension failures grows. If reaction conditions are chosen so that 
the reaction is 99.9% complete on a given reaction cycle, for example, after a further number, N, 
of successful extension reactions, the out-of-phase signal will be approximately (1 - 0.999 N ). The 
number N at which the out-of-phase signal becomes large enough to be incorrectly read as a 
correct extension signal is the read length. For example, after extension by 200 bases with 99.9% 
completion, the out-of-phase signal is approximately 18% of the in-phase signal, for a single 
base extension in either case. After extension by 400 bases the out-of-phase signal grows to 33%. 
The point at which the read must terminate is dictated by the ability to distinguish the in-phase 
signals from the out-of-phase signals. 

[00144] In what follows, a length of single base repeats, e.g. AAAAA, is treated as a single base 
for the purposes of discussing the phase difference between strands. If the reaction cycle of the 
four dNTP ! s is unchanged, then a primer strand which has failed to extend when the correct 
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dNTP, for example dATP, is in the reaction cell cannot trail the leading, i.e., majority strands, 
which did extend correctly, by more than 3 bases because the fourth base in the dNTP reaction 
cycle will always once again be the correct base (dATP) for the strand which failed to extend 
previously. It is assumed that extension failure is purely statistical, and that any strand which 
fails to extend has an equal chance of subsequent extension when the correct dNTP is supplied, 
and that this extension probability is sufficiently high that the chance of repeated extension 
failures on the same strand is vanishingly small. For example, if the probability of extension 
failure on a single strand is 0.1%, the probability of two extension failures on the same strand is 
(0.001 f or 10" 6 . Similarly, the trailing strand can never resynchronize with the leading strands if 
extension subsequently proceeds correctly, because the leading strands will always have 
extended by at least one more nucleotide - G, T, or C in the example discussion of an A 
extension failure - before the trailing strand can add the missing A. The effect is that after each 
complete dNTP cycle the trailing strands always follow the leading strands by an extension 
amount that represents the bases added in one complete dNTP cycle at a given point in the 
sequence. These observations predict that: (i) the gap between the leading and trailing strands 
perpetually oscillates between 1 and 3 bases and can never increase unless a second extension 
failure occurs on the same strand; and (ii) the gap between the leading and trailing strands is 
independent of the position along the trailing strand at which the extension failure occurs. This 
gap at any given point in the extension of the leading strands is solely a function of the sequence 
of the leading strand population up to that point and the dNTP probe cycle. In other words, a 
population of trailing strands is produced due to random extension failure at different points in 
the sequence, but these trailing strands themselves are all exactly in phase with each other. 
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[00145] Because the result of an extension failure is to produce a trailing strand population that 
trails the leading strands perpetually by an amount that oscillates between one and three 
nucleotides, assuming that a second extension failure does not occur on the trailing strand and 
that the probing dNTP cycle remains unchanged, therefore the gap between the leading and 
trailing strand populations can always be known by tracking the leading strand sequence by, for 
example, computer simulation and simulating an extension failure event at any point along the 
sequence. 

[00146] Thus the present invention provides, first, a general method of computer tracking of the 
sequence information which allows the out-of-phase error signals due to extension of trailing 
strands to be recognized and subtracted from the correct signals, and, second, methods of altering 
the probing dNTP cycle to selectively extend the trailing strands so that they move back into 
phase with the leading strands, thus completely eliminating sequence uncertainty due to 
out-of-phase signals arising from the trailing strands that result from extension failure. 

[00147] The statistics which govern the ability to distinguish an incorrect signal from 
out-of-phase strands from a correct signal depend upon the noise level and statistical variation of 
the fluorescence signal. Assuming that the signal for a correct 1-base extension has a standard 
deviation of ±5%, then statistically 99.75% of the signals will have an amplitude between 0.85 
and 1.15 (± 3 standard deviations from the average value) when the average value is 1.0 and the 
standard deviation is 0.05. If the extension signal must be at least 85% of the average single 
extension signal to register a correct extension, then statistically a correct extension will be 
missed only 0.13% of the time, i.e. the readout accuracy would be 99.87%. Another 0.13% of the 
signals for a correct extension will be greater than 1.15, but the concern is only with signals that 
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are lower than average and so are more difficult to distinguish from a growing signal from 
out-of-phase strands. The statistics for errors arising from out-of-phase extension of a trailing 
strand are similar. If the standard deviation of the trailing strand signals is also ±5% of the mean 
extension signal which will be true whenever the trailing strand intensity approaches the leading 
strand intensity, then if the trailing strand intensity does not grow beyond 0.7, the fraction of 
trailing strand extensions that give rise to a signal of 0.85 or greater 4 standard deviations beyond 
the mean is less than 0.01%. Thus an out-of phase signal arising from a single-base extension on 
one of the three sets of trailing strands should be distinguishable from the in-phase signal with - 
99.87% accuracy so long as the out-of-phase signal does not grow beyond - 70% of the in-phase 
signal. 

[00148] The above discussion assumes that all the extension events correspond to single base 
extensions. However, multiple single-base repeats are common in DNA sequences, thus one 
must consider the situation where the out-of-phase signal can be M times larger than that for a 
single base extension, where M is the repeat number. For example, if the population of one of 
the three sets of out-of-phase strands has grown to 20% of the leading strand population, at 
which level the in-phase and out-of-phase signals can readily be distinguished for a single base 
extension, then if this set of out-of-phase strands encounters a 5-base repeat, e.g. AAAAA, the 
signal for that repeat becomes identical in magnitude to that for a single base extension on the 
in-phase strands. Real-time computer monitoring of the extension signals permits discrimination 
against such repeat-enhanced out-of-phase signals, for example, by implementing linear and/or 
nonlinear auto-regressive moving average (ARMA) schemes. The essential points here are as 
follows (i) the out-of-phase strands are those that are trailing the majority strands as a result of 
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extension failure; misincorporation events which could produce leading error strands have the 
effect of shutting down further extension on the affected strands and so do not give rise to 
significant out-of-phase error signals; (ii) there is always only one population of trailing strands 
regardless of where the extension failure occurred; all the primer strands in this population have 
been extended to the same point which trails the leading strand sequence by 1, 2 or 3 bases; and 
(iii) because the leading strands have always previously traversed the sequence subsequently 
encountered by the trailing strands, the sequence at least 1 base beyond the 3' terminus of the 
trailing strands is always known and allows prediction of exactly whether, and by how much, 
these trailing strands will extend for any nucleotide supplied, by simulating, in a computer for 
example, the effect of an extension failure at any point in the known sequence upstream of the 
position to which the leading strands have advanced. 

[00149] On each incorporation trial, in addition to any possible correct extension signal for the 
leading strands, there may also be an error signal corresponding to extension of the trailing 
strands. For example, let us assume that the trailing strand population has grown as large as 20% 
of the leading strand population. The size of this population can be monitored by detecting the 
incorporation signal when the trailing strands extend and the leading strands do not. Assume that 
the leading strand population has just traversed a single base repeat region on the template, for 
example AAAAA, and incorporated onto the primer the complementary T repeat: TTTTT. The 
trailing strands will not traverse this same AAAAA repeat for at least a complete cycle of the 
four probing nucleotides, until the next time the strands are probed with dTTP. Knowing the size 
of the trailing strand population from the amplitude of its incorporation signals, determined at 
any point where the leading strands do not extend but the trailing strands do, the signal to be 
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expected from the trailing strand population due to the TTTTT incorporation can be calculated 
precisely. If the trailing strand population is 1/5 as large as the leading strand population, for 
example, this signal will mimic incorporation of a single T on the leading strand population. In 
the absence of the computer-aided monitoring method discussed here, such a false signal would 
give rise to a drastic sequence error. 

[00150] Figures 14A and 14B demonstrate how data would appear for a sequence: [CTGA] 
GAA ACC AGA AAG TCC [T], probed with a dNTP cycle: CAGT, close to the primer where 
no extension failure has occurred (Figure 14A) and well downstream (Figure 14B) at a point 
where 60% of the strands have undergone extension failure and are producing out-of-phase 
signals, and misincorporation has shut down extension on 75% of all strands. The readouts 
shown start at the second G in the sequence (beyond the [CTGA] sequence in parentheses) and 
end at the last C (before the [T] in parentheses). The digital nature of the signal in Figure 14A 
and also the amplitude scale should be noted. In Figure 14B, the signal for a single base 
extension has been reduced by 60%, from 1.0 to 0.4 due to the extension failure strands, and by a 
further factor of 4 to 0.1 due to misincorporation and the resulting 75% signal loss. However, 
added to the correct extension signals are signals due to the out-of-phase extension of the trailing 
strands. At first sight, the readout is completely different from the correct readout shown in 
Figure 14A, due to the superposition of signals produced when the trailing strands encounter the 
sequence previously traversed by the leading strands. Particularly large errors arise whenever the 
trailing strand population encounters the AAA repeats. For example, the second T probe yields a 
signal amplitude corresponding to an AAAAA repeat instead of the correct single A, the third G 
probe gives a signal corresponding to CCC when in fact there is no C at this point in the leading 
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strand sequence, the fourth T probe reads 4 A's when the correct sequence has none (the trailing 
strands encounter the second AAA repeat). However, because the sequence from the leading 
strands is known, the false signals arising from the trailing strands can be predicted and 
subtracted from the total signal to obtain the correct sequence readout. This is shown in Figure 
14C, where the signals arising from the trailing strands are coded by different shading from the 
leading strand signal. Because the signals due to the trailing strands can be predicted, the error 
signals can be subtracted to obtain the correct digital sequence readout shown in Figure 14D. It 
should be noted that the data in Figure 14D are now identical to those in Figure 14A, and yield 
the correct sequence readout for the leading strands, the only difference being that the overall 
intensity is reduced due to the assumed loss of signal due to misincorporation and extension 
failure, the latter populating the trailing strands. In other words, by keeping track of the sequence 
in a computer the effect is as though one could directly visualize the different contributions as 
depicted on the plot in Figure 14C. Therefore, it is possible to predict for any probe nucleotide 
event exactly what the signal from the trailing strand population should be, and subtract this error 
signal from the measured signal to arrive at a true digital signal representative of the sequence of 
the leading strand population, which is the desired result. 

[00151] Given the ability to compute and subtract any trailing strand signals as discussed, the 
accuracy with which nucleotide incorporation or non-incorporation on the leading strands can be 
sensed is limited, not by the absolute size of the trailing strand signal, but instead by the noise on 
those signals. For example, assume that the signal for a single-base extension of a trailing strand 
population equal to 20% of the leading strand population is 0.2 ± 0.05. If the trailing strands 
encounter a 5-base repeat, the resulting signal would be identical in amplitude to that produced 
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by a single-base extension of the leading strands, but this signal could be subtracted from the 
observed signal to yield either a signal resulting from a leading strand extension, or a null signal 
corresponding to no extension of the lading strands. Assuming that the noise is purely statistical 
and therefore is reduced in proportion to the square root of the signal amplitude, for a 5-base 
extension of the trailing strands or a single extension of the leading strands the signal would be 1 
± (0.05 x V5), i.e. 1 ± 0.11, because the statistical noise on a set of added signals grows as the 
square root of the number of signals. One can subtract from this value a correction signal which 
is much more accurately known because the trailing strand signal has been repeatedly measured 
yielding better statistics on this value. It is assumed that the uncertainty in the correction signal is 
negligible. For no extension of the leading strands, the resulting difference signal would be 0 ± 
0.11, whereas a single extension of the leading strands would yield a difference signal of 1 ± 
0. 1 1 ; the two signals are distinguishable with better than 99.9% accuracy. 

[00152] The example given here is an extreme case: in fact, the extension failure can be 
corrected at any point, so that it will be possible to minimize the trailing strand population below 
a level where it would produce signals that make the leading strand sequence uncertain. 

[00153] There are additional advantages to the computer-aided monitoring method proposed. 
First, the signals from the trailing strands serve as an additional check on the leading strand 
sequence. Second, the trailing strand population could be allowed to surpass the leading strand 
population in magnitude. Without computer-aided monitoring, readout would have to cease well 
before this point, however, with computer-aided monitoring, readout can continue, now using the 
trailing strands rather than the leading strands to reveal the sequence. Thus, the strand population 
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that trails due to only one extension failure now becomes the leading strand population for the 
purposes of computer aided monitoring. This allows readout to continue until further 
complications arise from the occurrence of 2 extension failures on the same strand, producing a 
new trailing strand population which can be tracked in the same way as the single failure strands, 
while the population of strands that have undergone no error failure diminishes to the point 
where it contributes no detectable signal. 

[00154] Optimization of reagents, enzyme and reaction conditions should allow 
misincorporation probabilities below 1%, and extension failure probabilities as low as 0.1%. The 
computer aided monitoring method of the present invention additionally provides a means for 
healing the trailing strand population by selectively extending this population so that it is again 
synchronous with the leading strands. For example, given a dNTP probe cycle of GCTA, and a 
template sequence (beyond the 3 r end of the primer) of: 

GTGCAGATCTG . . . 

and assuming that when dCTP is in the reaction chamber, the polymerase fails to incorporate a C 
in some fraction of the primer strands, the following results: 

Template GTG CAG ATC TG ... 

Main strands C 

Template GTG CAG ATC TG 

Failure strands 

At the end of the first cycle, the main strands have extended by ....CA, while the failure strand 

has not advanced. After one more complete cycle, the main strand extension is ....CAC and the 

failure strand now reads ....CA, i.e. now just one base out of phase. 

Template GTG CAG ATC TG . . . 

Main strands CAC 
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Template GTG CAG ATC TG ... 

Failure strands CA 

Because the phase lag arises from the repeating interaction of the probe cycle sequence with the 
template sequence, the unchanged probe cycle can never have the correct sequence to 
resynchronize the strands. Instead, if the probe cycle is unchanged, and if no further extension 
failures occur, the phase lag for a given failure strand oscillates perpetually between 1 and 3 
bases, counting single base repeats as one base for this purpose. However because the leading 
strand sequence up to the last extension is always known, one can determine the effect of 
introducing an extension failure at some upstream position. It should be noted that an extension 
failure introduced at any arbitrary upstream position, or any base type, always produces the same 
phase lag because the effect of an extension failure is to cause extension of the affected strand to 
lag by one complete dNTP cycle. Thus, it is possible to alter the probe cycle sequence, for 
example to probe with a C, instead of a G, after the last A in the sequence discussed above. The 
failure strand would advance while the main strands did not and the phase lag would heal. In yet 
another embodiment the dNTP probe cycle may be reversed whenever the phase lag shrinks to 
only 1 base. Whenever the phase difference declines to a single base, or repeats of a single base, 
then simply reversing the probe cycle sequence always resynchronizes the strands. 

[00155] Figure 15 shows how a leading strand population arising from incorrect extension of a 
fraction of primer strands due to nucleotide impurities can adversely affect the signals from the 
main population. Using the same template sequence as before: 

[CTGA] GAA ACC AGA AA GTC C [TC AGT] and the same probe cycle: CAGT, the effect of 
a leading strand population which is 20% of the main strand population can be simulated and 2 
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bases ahead of the main strands at the time the main strand sequence begins to be read. The 
leading strands have already extended by -C TTT. The first C probe extends the main primer 
strands by one base complementary to the first G in the sequence giving a single base extension 
signal of 1 . The first G extends the leading strands by -GG- complementary to the -CC- repeat, 
giving a signal of 0.4. Greater ambiguity arises when the leading strands encounter the second - 
AAA-repeat at the second T probe, increasing the main strand signal from the correct value for a 
single base extension to 1.6. In the absence of further information, this value will be ambiguous 
or may be interpreted as a 2-base repeat. 

[00156] Correction for these ambiguities comes from the fact that the correct sequence of the 
main strands is read following the leading strand read. In general, a large multiple repeat which 
can give an error signal when encountered by the leading strands will subsequently give a larger 
signal when encountered by the main strands, and superimposed on this correct signal will be a 
leading strand signal for which there are three possibilities: (i) zero signal: the leading strands do 
not extend; (ii) small signal that does not create ambiguity -the leading strands extend by a single 
base or a repeat number small enough not to simulate an additional base extension of the main 
strands; (iii) large signal; the leading strands encounter a second large repeat. By monitoring the 
main strand sequence, it is possible at each extension to retroactively estimate the effects of a 
leading strand population and subtract such signals from the main strand signals to arrive at a 
correct sequence. In the case where the leading strands encounter a repeat large enough to create 
ambiguity in the sequence, even if the leading strands subsequently encounter a second or third 
large repeat when the main strands encounter the first repeat, the main strands will eventually 
traverse the same region to give sufficient information to derive the correct sequence. In other 
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words, the sequence information at any point is always overdetermined - the signal for any given 
extension is always read twice, by the leading strands and the main strands, and so yields 
sufficient information to determine both the correct sequence and the fractional population of the 
leading strands, which are the two pieces of information required. 

[00157] Because the sequence of the leading strand population produced by impure nucleotides 
cannot be known until it is subsequently traversed by the main strands, one cannot know what 
dNTP probe cycle would act to extend the main strands while not extending the leading strands, 
as was the case for a trailing strand population produced by extension failure. However, as with 
trailing strands, the gap between the leading and main strands oscillates perpetually between one 
and three bases, and can be reconnected by reversing the dNTP probe sequence whenever the 
gap between the leading and main strands shrinks to a single base. Although it cannot be known 
when this single base gap occurs, the dNTP probe sequence can be reversed at regular intervals. 
Trials indicate that such a process ultimately reconnects approximately 2/3 of the leading strands. 
The statistics for this process are as follows. 

[00158] Statistically, because the gap between the main and leading strands can be 1, 2 or 3 
bases, there is a 1/3 probability that the leading strand population will have only a 1-base phase 
lag at any time the cycle is reversed. The 1-base phase difference will always be healed by a 
cycle reversal. Another 1/3 of the time the leading strands are 2 bases ahead at the time the cycle 
is reversed. For the next probing base the following possibilities exist: 
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Lead Main 

strand strand 

0 0 No extension on either strand: Prob 3/4 x 3/4 = 9/16 

+1 0 Phase lag increases: Prob 1/4 x 3/4 = 3/12 

+1 +1 Both strands advance: Prob. 1/4 x 1/4 = 1/16 

0 +1 Phase lag decreases: Prob. 3/4 x 1/4 = 3/12 



Phase lag stays at 2: Number of chances = 10/16 
Phase lag decreases Number of chances = 3/12 
Phase lag increases Number of chances = 3/12 

So the chance of making a 2-base gap worse is (3/12)/(10/16 + 3/12) - 28% 
Considering all three gap sizes: 

1- base gap heals (33% of population) 

2- base gap gets ,worse 28% of the time: only 1/3 of gaps are 2 base, so 9% total get worse 
3 base gap also gets worse 28% of the time, again 9% overall effect 

In sum, 33% heal at a given reversal, 18% lose at a given reversal and the remaining 50% are 
unchanged. Even assuming the 18% are permanently lost (and a 2 base gap increased to a 3 base 
gap can still rejoin), at each subsequent reversall/3 of the 50% of strands are healed, which are 
unchanged by the previous reversal, as follows: 



Reversal # Fraction of gaps healed 

1 33% 

2 17% 

3 9% 

4 4.5 % 

5 2.5 % 

6 1% 

Total -67% 



NY02:344421.1 



71 



31734 PCT-USA-A 



Therefore, repeated reversal of the dNTP probe cycle can reduce by 2/3 the effects of out-of- 
phase signals due to incorrect extension by nucleotide impurities, or random extension failure, 
effectively increasing the read length when limited by either effect by a factor of 3. 

[00159] Although the invention has been described herein with reference to specific 
embodiments, many modifications and variations therein will readily occur to those skilled in the 
art. Accordingly, all such variations and modifications are included within the intended scope of 
the invention. 
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